Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🧱 private preview with Unity catalog and Oauth using service principal #37613

Merged
merged 38 commits into from
May 19, 2024

Conversation

gisripa
Copy link
Contributor

@gisripa gisripa commented Apr 26, 2024

Review guide

Ignore deletes in src/main/java, /src/test/java, /src/test-integration/java. Old code deleted.
Databricks destination using AsyncConsumer, V2 column names, Stripped down version of classes needed for Staging and TypeDedupe.

User Impact

This is a breaking change and a private preview release. Destination only supports CSV file format at the moment and works with Unity catalog for tables and Volumes. Cloud and OSS are pinned to 1.1.0 which was the last known stable release of the community connector.
🥳

Can this PR be safely reverted and rolled back?

  • NOPEEEE ❌ - 1.1.0 version pinned in Cloud and OSS

Copy link

vercel bot commented Apr 26, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 19, 2024 5:55pm

@octavia-squidington-iii octavia-squidington-iii added area/connectors Connector related issues CDK Connector Development Kit connectors/destination/databricks labels Apr 26, 2024
@gisripa gisripa changed the title Mason at work with bricks AirMason: at work with bricks Apr 26, 2024
@gisripa gisripa changed the title AirMason: at work with bricks AirMason: at work with 🧱 Apr 26, 2024
@gisripa
Copy link
Contributor Author

gisripa commented May 3, 2024

@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label May 9, 2024
@gisripa gisripa changed the title AirMason: at work with 🧱 🧱 - private preview with Unity catalog and Oauth using service principal May 9, 2024
@gisripa gisripa changed the title 🧱 - private preview with Unity catalog and Oauth using service principal 🧱 private preview with Unity catalog and Oauth using service principal May 9, 2024
@gisripa gisripa marked this pull request as ready for review May 9, 2024 18:06
@gisripa gisripa requested review from a team as code owners May 9, 2024 18:06
applicationDefaultJvmArgs = ['-XX:+ExitOnOutOfMemoryError', '-XX:MaxRAMPercentage=75.0']
mainClass = 'io.airbyte.integrations.destination.databricks.DatabricksDestinationKt'
applicationDefaultJvmArgs = ['-XX:+ExitOnOutOfMemoryError', '-XX:MaxRAMPercentage=75.0',
'-XX:NativeMemoryTracking=detail', '-XX:+UnlockDiagnosticVMOptions',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some comment as to why this is here

import io.airbyte.integrations.destination.databricks.model.OAuth2Authentication
import javax.sql.DataSource

object ConnectorClientsFactory {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do we think about always prefixing our connector-specific classes with the connector name (DatabricksConnectorClientFactory here) ?

connectorConfig.authentication
)
val storageOperations =
DatabricksStorageOperations(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dumb question: is there any case where we'll use the DatabricksStorageOperations with anything else than those particular types? Should we just create them inside that constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

injected as dependency for mocking them, although didn't write unit test in this 😅 . Will do that in CDK PR when doing the bigquery thing now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll want to experiment with how we test these components. E.g. do we ever want to test sqlgenerator/destinationhandler/etc in isolation, or are we always just testing the entire StorageOperations object?

(my personal guess is that we'll mostly just run integration tests against FooStorageOperations - e.g. our BaseSqlGeneratorIntegrationTest is also expanding to exercise destinationhandler stuff)

also, up for discovery: is it actually just StagingDestinationStorageOperations, and we want to inject databricks stuff through the connector? Or is there actual databricks-specific stuff directly in the StorageOperations impl? (I don't know the answer, this is something we should figure out)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if we can eventually fold our SqlGen tests as StorageOps tests. TBH it needs for injection are ConnectorConfig static info like schema, catalog, jdbcurl and both SqlGen and DestHandler can be internally instantiated.

Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only skimmed this, fully expecting us to make a lot of changes while we're working on the bigquery stuff. Just a couple nits/info comments.

@@ -23,4 +23,5 @@ data:
sl: 100
ql: 100
supportLevel: community
supportsDbt: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want this? my understanding (80%) is that this triggers platform's custom dbt transform thing, which they recently killed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy-pasta. I'll remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🪓

import io.airbyte.commons.jackson.MoreMappers

data class DatabricksConnectorConfig(
@JsonProperty("accept_terms") val termsAccepted: Boolean,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: our spec.json already has an explicit title field to control what the UI, why not just use camelCase for the object keys?

connectorConfig.authentication
)
val storageOperations =
DatabricksStorageOperations(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll want to experiment with how we test these components. E.g. do we ever want to test sqlgenerator/destinationhandler/etc in isolation, or are we always just testing the entire StorageOperations object?

(my personal guess is that we'll mostly just run integration tests against FooStorageOperations - e.g. our BaseSqlGeneratorIntegrationTest is also expanding to exercise destinationhandler stuff)

also, up for discovery: is it actually just StagingDestinationStorageOperations, and we want to inject databricks stuff through the connector? Or is there actual databricks-specific stuff directly in the StorageOperations impl? (I don't know the answer, this is something we should figure out)

@octavia-squidington-iii octavia-squidington-iii removed the CDK Connector Development Kit label May 10, 2024
@@ -15,12 +15,11 @@
plugins {
id 'application'
id 'airbyte-java-connector'
id "de.undercouch.download" version "5.0.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, what did this do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂 no clue

@@ -23,4 +23,5 @@ data:
sl: 100
ql: 100
supportLevel: community
supportsDbt: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🪓

@@ -344,7 +344,8 @@ Delta Lake tables are created. You may want to consult the tutorial on
## CHANGELOG
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be a lot more docs to update once we are ready to release - fine to skip that for now

@gisripa gisripa merged commit c627ce8 into master May 19, 2024
29 checks passed
@gisripa gisripa deleted the gireesh/masonry branch May 19, 2024 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/destination/databricks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants