|
1 | 1 | # Bulk CDK
|
2 | 2 |
|
3 | 3 | The Bulk CDK is the "new java CDK" that's currently incubating.
|
4 |
| -It's written in Kotlin and consists of a _core_ and a bunch of _toolkits_: |
5 |
| -- The _core_ consists of the Micronaut entry point and other objects which are expected in |
6 |
| - connectors built using this CDK. |
7 |
| -- The _toolkits_ consist of optional modules which contain objects which are common across |
8 |
| - multiple (but by no means all) connectors. |
9 |
| - |
10 |
| -While the CDK is incubating, its published version numbers are 0.X where X is monotonically |
11 |
| -increasing based on the maximum version value found on the maven repository that the jars are |
12 |
| -published to: https://airbyte.mycloudrepo.io/public/repositories/airbyte-public-jars/io/airbyte/bulk-cdk/ |
13 |
| - |
14 |
| -Jar publication happens via a github workflow triggered by pushes to the master branch, i.e. after |
15 |
| -merging a pull request. |
| 4 | +As the name suggests, its purpose is to help develop connectors which extract or load data in bulk. |
| 5 | +The Bulk CDK is written in Kotlin and uses the Micronaut framework for dependency injection. |
| 6 | + |
| 7 | +## Structure |
| 8 | + |
| 9 | +The Bulk CDK consists of a _core_ and a bunch of _toolkits_. |
| 10 | + |
| 11 | +### Core |
| 12 | + |
| 13 | +The _core_ consists of the Micronaut entry point and other objects which are expected in |
| 14 | +connectors built using this CDK. |
| 15 | + |
| 16 | +The core is broken down into multiple gradle projects; for example the core functionality for |
| 17 | +building sources is in `extract`. |
| 18 | + |
| 19 | +Following up on that example, the expectation for a source connector is that it will use all the |
| 20 | +interfaces and implementations in `extract` unless it has a very good reason not to. |
| 21 | +There is plenty of value in having all source connectors behave predictably. |
| 22 | + |
| 23 | +### Toolkits |
| 24 | + |
| 25 | +The _toolkits_ consist of optional modules which contain objects which are common across |
| 26 | +multiple (but by no means all) connectors. |
| 27 | + |
| 28 | +For example, there's an `extract-jdbc` toolkit to help build source connectors which extract data |
| 29 | +using the JDBC API. |
| 30 | +The expectation for a toolkit is that it provides naive implementations of core interfaces. |
| 31 | +These implementations will be thoroughly tested inside the CDK to serve as a baseline of |
| 32 | +functionality; however the connector may (and in fact often should!) replace parts of these. |
| 33 | + |
| 34 | +Following up on the example of `extract-jdbc`, a source connector needs to implement SQL query |
| 35 | +generation interfaces and, for schema discovery, may prefer to query system tables directly |
| 36 | +instead of relying on the generic JDBC metadata methods. |
| 37 | + |
| 38 | +## Dependencies |
| 39 | + |
| 40 | +The Bulk CDK gradle build relies heavily on so-called [BOM dependencies](https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#bill-of-materials-bom-poms). |
| 41 | +This pattern is strongly encouraged to keep transitive version conflicts to a minimum. This is beneficial for many reasons, including reproducible builds and a good security posture. |
| 42 | + |
| 43 | +Consider for example the whole Jackson ecosystem. |
| 44 | +Using a BOM allows us to add specific Jackson dependencies without having to figure out which |
| 45 | +version number to use. |
| 46 | +This has some pleasant ripple-effects: |
| 47 | + |
| 48 | +- When the need comes to bump the version, there's only one version number to bump and that's in |
| 49 | + the BOM import. |
| 50 | + Consequently, the declared version has a much higher chance of being the effective version |
| 51 | + picked by gradle during dependency resolution. |
| 52 | + |
| 53 | +- The BOM import is re-exported by the `bulk-cdk-core-base` artifact meaning that the rest of the |
| 54 | + CDK as well as connectors don't need to worry about Jackson version numbers either. |
| 55 | + |
| 56 | +It gets better when multiple BOMs are involved. |
| 57 | +Consider for example Micronaut and Jackson: Micronaut also depends on Jackson. |
| 58 | +This can (and will!) cause dependency version conflicts; these are much easier to resolve by |
| 59 | +reconciling just two BOM versions. |
| 60 | + |
| 61 | +While BOMs are undoubtedly useful, let's still try to keep external dependencies to a minimum |
| 62 | +outside of tests. |
| 63 | +Less dependencies, less problems. |
| 64 | + |
| 65 | +## Developing |
| 66 | + |
| 67 | +Perhaps the most striking difference with the legacy java CDK from a connector DX perspective is |
| 68 | +that there are no facilities equivalent to `useLocalCdk = true`. |
| 69 | + |
| 70 | +This is deliberate and the intention here is to force the testing of CDK functionality to remain |
| 71 | +in the CDK. |
| 72 | +Recall that this is too often not the case in the legacy java CDK because it's simply not possible |
| 73 | +to do so there. |
| 74 | + |
| 75 | +The Bulk CDK is different. |
| 76 | +Dependency injection makes it possible to mock concrete implementation behavior realistically |
| 77 | +enough that Bulk CDK tests have entire fake connectors defined inside of them. |
| 78 | + |
| 79 | +There's no reason now not to first make changes to the CDK and publish those, and only then make |
| 80 | +downstream changes to a connector. |
| 81 | + |
| 82 | +If there's truly a need to develop both simultaneously, then the way to go may be to: |
| 83 | +1. do experimental development in the connector, keeping the CDK- and the connector-specific code |
| 84 | + separate; |
| 85 | +2. once the CDK-specific code is reasonably mature, hoist it into the Bulk CDK and test it there; |
| 86 | +3. finally, publish those changes and have the connector depend on the latest Bulk CDK version. |
| 87 | + |
| 88 | +## Publishing |
| 89 | + |
| 90 | +While the CDK is incubating, its published version numbers are 0.X where X is the _build number_. |
| 91 | +This build number is monotonically increasing and is based on the maximum version value found on |
| 92 | +the [maven repository that the jars are published to](https://airbyte.mycloudrepo.io/public/repositories/airbyte-public-jars/io/airbyte/bulk-cdk/). |
| 93 | + |
| 94 | +Artifact publication happens via a [github workflow](../../.github/workflows/publish-bulk-cdk.yml) |
| 95 | +which gets triggered by any push to the master branch, i.e. after merging a pull request. |
| 96 | + |
| 97 | +From a contributor's perspective, this means that there's no need to worry about versions or |
| 98 | +changelogs. |
| 99 | +From a client's perspective, just always use the latest version. |
| 100 | + |
| 101 | +Once the incubation period winds down and the CDK stabilizes, we can start thinking about contracts, |
| 102 | +semantic versioning, and so forth; but not until then. |
| 103 | + |
| 104 | +## Licensing |
| 105 | + |
| 106 | +The license for the Bulk CDK is Elastic License 2.0, as specified by the LICENSE file in the root |
| 107 | +of this git repository. |
0 commit comments