🎉 New Source: Orb #9985

kgrover · 2022-02-01T23:28:18Z

What

This adds an HTTP source connector for the Orb billing service.

How

This adds incremental streams for the following resources in the Orb data model:
(1) Subscriptions
(2) Plans
(3) Customers
(4) Credit Ledger Entries

🚨 User Impact 🚨

There should be no user impact as a result of this addition.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

Create a non-forked branch based on this PR and test the below items on it
Build is successful
Credentials added to Github CI. Instructions.
/test connector=connectors/<name> command is passing.
New Connector version released on Dockerhub by running the /publish command described here
After the connector is published, connector added to connector index as described here
Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

CLAassistant · 2022-02-01T23:28:22Z

All committers have signed the CLA.

kgrover · 2022-02-01T23:33:44Z

Test results (unit tests, acceptance tests) screenshots as attachments.

tuliren · 2022-02-02T23:19:47Z

airbyte-integrations/connectors/source-orb/integration_tests/configured_catalog.json

+    {
+      "stream": {
+        "name": "customers",
+        "json_schema": {


By the way, all the json_schema fields in the integration test catalog can be replaced with an empty object. The full schema will populated automatically.

Suggested change

"json_schema": {

"json_schema": {},

I went ahead and removed the json_schema from the integration test catalog as you suggested

ChristopheDuong · 2022-02-03T09:22:26Z

airbyte-integrations/connectors/source-orb/bootstrap.md

+Note that the Credits Ledger Entries must read all Customers for an incremental sync, but will only incrementally return new ledger entries for each customer.
+
+Since the Orb API does not allow querying objects based on `updated_at`, these incremental syncs will capture updates to newly created objects but not resources updated after object creation.


How often do we expect "updates" on objects after creation?

How do we deal with those "rare" situations if that happens?

When running in incremental, should the connector (or should the user have a second connection) do a full refresh occasionally to verify that we are not missing such updates at a lower frequency?

This will depend on the resource:

I expect the Subscription and Customer resources will indeed change (e.g. the state of a subscription may become inactive, or a Customer may have a new shipping address) often. On the other hand, Credit Ledger Entries are immutable and I do not expect the Plan resource to change often either.

I'll go ahead and document that in this boostrap.md file!

As you said, I would expect there to be a second connection that occasionally runs to resync all data. The other option is for me to implement a "lookback" feature similar to the Stripe Connector (which features a lookback_window_days to always resync n days in the past). Do you recommend adding that parameter, or is it common to have two connections in this situation?

It's probably better to implement the lookback feature as you described directly in the connector's option.

So, this limitation is more obvious to the user when configuring this connector rather than expecting them to know and guess they need a second full refresh connection for such cases.

ChristopheDuong · 2022-02-03T09:27:13Z

airbyte-integrations/connectors/source-orb/integration_tests/catalog.json

+              "type": "object",
+              "properties": { "id": { "type": "string" } }


Does this single field really need to be nested inside this stream?

Could it be possible to simplify it by a customer_id instead?

or do we expect to add more columns to the nested customer object? In that case, the customer object could be added later on with extra properties when that happens?

It would make it much easier / cleaner to manipulate at the destination if we could flatten it a bit at the source level (every nested object will be extracted in its own table in certain destinations)

Hi @ChristopheDuong that's a great point - if we want to keep just the customer_id here, would you recommend that we implement this with a schema transformer, or is there a different recommended solution? Let me know if there's an instructive example to look at for this sort of transform.

I believe it would be nice to maintain a customer_id on the Subscription resource rather than removing it because synced Subscriptions in a destination may be more useful with a customer_id field.

@ChristopheDuong I attempted to look through existing usages of the custom normalizer, and I think it may be possible to do this with by inspecting the field subschema, and flattening the resource if the subschema contains an id field.

Is that the approach you'd recommend? I'm curious if there's a more standardized way to flatten responses as other APIs return nested resources as well.

@ChristopheDuong just following up here to see if the use of the custom transform to accomplish this sounds reasonable to you!

Sorry, I haven't answered right away as I'm not too familiar with the custom normalizer.

I guess it does sound a reasonable approach to me though.

No problem - I'll give that a try!

@kgrover the schema transformer only coerces data types to match something declared in the catalog, so it wouldn't achieve the flattening you're after.

for an example: the google ads connector achieves this via custom code. See here. It's achieved in a bit of a complicated way: fields are declared in the schema JSON with dots in their name e.g: customer.details.first_name. The dots denote nesting in the response returned by the API. The connector then reads the field name in the schema, and knows how to recursively denest things from the API.

For now unfortunately your path forward is also via custom transformation.

I created a relevant issue here to address this need out of the box in the CDK.

ChristopheDuong · 2022-02-03T09:27:53Z

airbyte-integrations/connectors/source-orb/integration_tests/catalog.json

+            "end_date": { "type": ["null", "string"], "format": "date-time" },
+            "plan": {
+              "type": "object",
+              "properties": { "id": { "type": "string" } }


Same question as with the customer field
there's a bunch more of these, I'll stop here haha

ChristopheDuong · 2022-02-03T09:28:19Z

airbyte-integrations/connectors/source-orb/integration_tests/catalog.json

+              "type": ["array"],
+              "items": {
+                "type": "object",
+                "properties": { "id": { "type": "string" } }


Here, since it's an array, it seems fine i guess

ChristopheDuong · 2022-02-03T09:54:57Z

airbyte-integrations/connectors/source-orb/source_orb/spec.json

+    "properties": {
+      "api_key": {
+        "type": "string",
+        "title": "Orb API Key",
+        "description": "Orb API Key, issued from the Orb admin console.",
+        "airbyte_secret": true,
+        "order": 1
+      }
+    }


Should there be an option to specify a "minimum" date to start sync from if someone doesn't wish to get all data from the beginning?

Yes, that sounds reasonable; I can add that as a parameter to the connector to avoid syncing from the beginning.

lmossman · 2022-02-04T23:15:21Z

airbyte-integrations/connectors/source-orb/integration_tests/catalog.json

+        "source_defined_primary_key": [["id"]]
+      },
+      {
+        "name": "credits_ledger_entries",


@kgrover would it be possible to include the usage event properties that we sent to Orb in the credit ledger entry output? This would give us more flexibility to analyze the usage data, e.g. being able to group the output on event properties, or use the event properties to join this data against other tables

@lmossman thanks for bringing this up - let me chat with the team and get back to you on this request!

…uration

kgrover added 4 commits February 1, 2022 13:29

V1 of source_orb connector

9f7d5d5

add boostrap.md file

621f71e

add clause on Pagination to bootstrap.md

986f98a

add SUMMARY documentation

6b900a9

octavia-squidington-iii added the community label Feb 1, 2022

github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Feb 1, 2022

lmossman requested a review from ChristopheDuong February 2, 2022 19:40

tuliren approved these changes Feb 2, 2022

View reviewed changes

ChristopheDuong reviewed Feb 3, 2022

View reviewed changes

lmossman reviewed Feb 4, 2022

View reviewed changes

add lookback_window_days connector parameter

8a1c724

marcosmarxm mentioned this pull request Feb 11, 2022

CI Sandbox for Orb source connector #10287

Closed

kgrover added 3 commits February 12, 2022 13:55

Add support for start_date parameter

77da5be

Add ability to transform record in order to un-nest IDs

fe75b06

Add support for extracting event properties based on connector config…

7322655

…uration

marcosmarxm approved these changes Feb 21, 2022

View reviewed changes

marcosmarxm merged commit 1e0ac30 into airbytehq:master Feb 21, 2022

octavia-squidington-iii mentioned this pull request Feb 22, 2022

Bump Airbyte version from 0.35.32-alpha to 0.35.33-alpha #10546

Merged

		Note that the Credits Ledger Entries must read all Customers for an incremental sync, but will only incrementally return new ledger entries for each customer.

		Since the Orb API does not allow querying objects based on `updated_at`, these incremental syncs will capture updates to newly created objects but not resources updated after object creation.

		"type": "object",
		"properties": { "id": { "type": "string" } }

🎉 New Source: Orb #9985

🎉 New Source: Orb #9985

Uh oh!

Conversation

kgrover commented Feb 1, 2022

What

How

Recommended reading order

🚨 User Impact 🚨

Pre-merge Checklist

Community member or Airbyter

Airbyter

Uh oh!

CLAassistant commented Feb 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kgrover commented Feb 1, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChristopheDuong Feb 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChristopheDuong Feb 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChristopheDuong Feb 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChristopheDuong Feb 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CLAassistant commented Feb 1, 2022 •

edited

Loading

ChristopheDuong Feb 3, 2022 •

edited

Loading

ChristopheDuong Feb 3, 2022 •

edited

Loading

ChristopheDuong Feb 3, 2022 •

edited

Loading

ChristopheDuong Feb 3, 2022 •

edited

Loading