Skip to content

Destination Databricks : Allow Incremental Deduped sync #14445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

shrodingers
Copy link
Contributor

@shrodingers shrodingers commented Jul 6, 2022

What

Enables the incremental deduped sync on databricks connector

How

  • Adds databricks to the list of connectors supporting normalization
  • Completes / adds dbt macros to be compatible with databricks
  • Generates the normalization models with merge strategy
  • Modifies the underlying storage and the way the connector works (used to use parquet adapter which expanded the first rows, and having for that some schema inferring mechanisms, which are incompatible with the raw data and stream schema needed for normalization models to work), thus going from parquet staging storage to CSV staging (CSV connectors let the choice to expand headers rows or not)

Recommended reading order

  1. dbt added macros / models
  2. normalization updates
  3. connector updates

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

There are breaking changes, since it totally modifies the way data is stored through the connector (may be a good idea to separate this connector from the original one to be not breaking)

Pre-merge Checklist

Expand the relevant checklist and delete the others.

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Tests

Unit

Put your unit tests output here.

Integration

Put your integration tests output here.

Acceptance

Put your acceptance tests output here.

@shrodingers
Copy link
Contributor Author

@tuliren, sent you an invite for collaboration :)

@sajarin sajarin removed their assignment Sep 9, 2022
@shrodingers shrodingers requested review from a team as code owners September 20, 2022 13:09
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ shrodingers
❌ EC2 Default User


EC2 Default User seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@marcosmarxm
Copy link
Member

Hello @shrodingers Liren is actively working in the Databricks destination and will return to your contribution in the future.

@shrodingers
Copy link
Contributor Author

Hello @marcosmarxm ! Thanks for the update, that's good news ! While running these changes as a separate custom connector, I stumbled upon some issues / some whereabouts so at the time this will be considered, i'll be more than happy to help and discuss pros and cons of these changes.

@krishnaglick
Copy link
Contributor

I glanced through the FE changes and it looks like airbyte-webapp was hit with a linter it shouldn't be. Unsure if there were actual FE changes but what I did see was simply linting changes that should be pulled out.

@marcosmarxm
Copy link
Member

Hello 👋, first thank you for this amazing contribution.

We really appreciate the effort you've made to improve the project.
We ask you patience for the code review. Last month our team was focused on Hacktoberfest event and that probably left some PR without the proper feedback. And this week, due to the Thanksgiving US Holiday, most our team is out of office with their families. Another important piece of information why code won't be merge this week is: as a safety measure the core team has decided to freeze merging code to main branch to keep the release stable. Next week we'll return to you with the proper code review and update the status of your contribution.

If you have any questions feel free to send me a message in Slack!
Thanks!

@shrodingers
Copy link
Contributor Author

shrodingers commented Nov 25, 2022

Hello @krishnaglick, @marcosmarxm !
Thanks for the update. To answer you about these, I think this pr is more intended as a demonstration of what I wanted to achieve and how I did some stuff, won't be mergeable like that so I think it will be a totally different one with the correct changes in the end (so no changes intended in UI, ofc, that was just a tentative to have the databricks logo in UI that led me to do this).
Used it in production for a while but finally rolled it back for some issues arised that I didn't have the platform knowledge to easily fix. But I still think a certain portions of this poc changes will still be usable to implement it, thus this pr. And i'm willing to help when time will come, to make this feature compliant with airbyte-ic way to do this, and have it up-to-date with the refactor that I understood was in progress. I'm also really interested to be able to discuss about some choices that have been made on the first version, and what's planned for this connector :)
Anyways thanks for your time and can't wait to tackle this !

@marcosmarxm
Copy link
Member

Hello 👋:skin-tone-2: and thank you for your contribution!

Airbyte has instituted a code freeze between 19 and 30 December, to make sure there are no disruptions during the holidays.
Because of this, reviewing and merging your contribution may take longer than usual.
We apologize for the delay, but we want everyone to have a quiet and happy holiday.

If you have any questions or need further clarification, please don't hesitate to ping via Slack.

@subodh1810 subodh1810 requested review from a team and removed request for tuliren and a team February 6, 2023 08:56
@natalyjazzviolin natalyjazzviolin removed the area/frontend Related to the Airbyte webapp label Mar 7, 2023
@natalyjazzviolin
Copy link
Contributor

Hey, @shrodingers! It looks like the best course of action is to split this PR up into smaller ones and refactor it. As you yourself mention that it is not mergeable, I'm closing it for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants