-
Notifications
You must be signed in to change notification settings - Fork 4.6k
🎉 New Source: Orb #9985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 New Source: Orb #9985
Conversation
{ | ||
"stream": { | ||
"name": "customers", | ||
"json_schema": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, all the json_schema
fields in the integration test catalog can be replaced with an empty object. The full schema will populated automatically.
"json_schema": { | |
"json_schema": {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and removed the json_schema
from the integration test catalog as you suggested
Note that the Credits Ledger Entries must read all Customers for an incremental sync, but will only incrementally return new ledger entries for each customer. | ||
|
||
Since the Orb API does not allow querying objects based on `updated_at`, these incremental syncs will capture updates to newly created objects but not resources updated after object creation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often do we expect "updates" on objects after creation?
How do we deal with those "rare" situations if that happens?
When running in incremental, should the connector (or should the user have a second connection) do a full refresh occasionally to verify that we are not missing such updates at a lower frequency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will depend on the resource:
I expect the Subscription
and Customer
resources will indeed change (e.g. the state of a subscription may become inactive, or a Customer may have a new shipping address) often. On the other hand, Credit Ledger Entries are immutable and I do not expect the Plan
resource to change often either.
I'll go ahead and document that in this boostrap.md
file!
As you said, I would expect there to be a second connection that occasionally runs to resync all data. The other option is for me to implement a "lookback" feature similar to the Stripe Connector (which features a lookback_window_days
to always resync n days in the past). Do you recommend adding that parameter, or is it common to have two connections in this situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably better to implement the lookback
feature as you described directly in the connector's option.
So, this limitation is more obvious to the user when configuring this connector rather than expecting them to know and guess they need a second full refresh connection for such cases.
"type": "object", | ||
"properties": { "id": { "type": "string" } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Does this single field really need to be nested inside this stream?
- Could it be possible to simplify it by a
customer_id
instead? - or do we expect to add more columns to the nested
customer
object? In that case, thecustomer
object could be added later on with extra properties when that happens?
It would make it much easier / cleaner to manipulate at the destination if we could flatten it a bit at the source level (every nested object will be extracted in its own table in certain destinations)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ChristopheDuong that's a great point - if we want to keep just the customer_id
here, would you recommend that we implement this with a schema transformer, or is there a different recommended solution? Let me know if there's an instructive example to look at for this sort of transform.
I believe it would be nice to maintain a customer_id
on the Subscription
resource rather than removing it because synced Subscriptions in a destination may be more useful with a customer_id
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChristopheDuong I attempted to look through existing usages of the custom normalizer, and I think it may be possible to do this with by inspecting the field subschema, and flattening the resource if the subschema contains an id
field.
Is that the approach you'd recommend? I'm curious if there's a more standardized way to flatten responses as other APIs return nested resources as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChristopheDuong just following up here to see if the use of the custom transform to accomplish this sounds reasonable to you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I haven't answered right away as I'm not too familiar with the custom normalizer.
I guess it does sound a reasonable approach to me though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem - I'll give that a try!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kgrover the schema transformer only coerces data types to match something declared in the catalog, so it wouldn't achieve the flattening you're after.
for an example: the google ads connector achieves this via custom code. See here. It's achieved in a bit of a complicated way: fields are declared in the schema JSON with dots in their name e.g: customer.details.first_name
. The dots denote nesting in the response returned by the API. The connector then reads the field name in the schema, and knows how to recursively denest things from the API.
For now unfortunately your path forward is also via custom transformation.
I created a relevant issue here to address this need out of the box in the CDK.
"end_date": { "type": ["null", "string"], "format": "date-time" }, | ||
"plan": { | ||
"type": "object", | ||
"properties": { "id": { "type": "string" } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as with the customer field
there's a bunch more of these, I'll stop here haha
"type": ["array"], | ||
"items": { | ||
"type": "object", | ||
"properties": { "id": { "type": "string" } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, since it's an array, it seems fine i guess
"properties": { | ||
"api_key": { | ||
"type": "string", | ||
"title": "Orb API Key", | ||
"description": "Orb API Key, issued from the Orb admin console.", | ||
"airbyte_secret": true, | ||
"order": 1 | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be an option to specify a "minimum" date to start sync from if someone doesn't wish to get all data from the beginning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that sounds reasonable; I can add that as a parameter to the connector to avoid syncing from the beginning.
"source_defined_primary_key": [["id"]] | ||
}, | ||
{ | ||
"name": "credits_ledger_entries", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kgrover would it be possible to include the usage event properties that we sent to Orb in the credit ledger entry output? This would give us more flexibility to analyze the usage data, e.g. being able to group the output on event properties, or use the event properties to join this data against other tables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lmossman thanks for bringing this up - let me chat with the team and get back to you on this request!
What
This adds an HTTP source connector for the Orb billing service.
How
This adds incremental streams for the following resources in the Orb data model:
(1) Subscriptions
(2) Plans
(3) Customers
(4) Credit Ledger Entries
Recommended reading order
source.py
contains the relevant code for this connector, including the concrete implementation of all incremental streams🚨 User Impact 🚨
There should be no user impact as a result of this addition.
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/SUMMARY.md
docs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing./publish
command described here