Skip to content

🎉 New Source: Orb #9985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 21, 2022

Conversation

kgrover
Copy link
Contributor

@kgrover kgrover commented Feb 1, 2022

What

This adds an HTTP source connector for the Orb billing service.

How

This adds incremental streams for the following resources in the Orb data model:
(1) Subscriptions
(2) Plans
(3) Customers
(4) Credit Ledger Entries

Recommended reading order

  1. source.py contains the relevant code for this connector, including the concrete implementation of all incremental streams

🚨 User Impact 🚨

There should be no user impact as a result of this addition.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/SUMMARY.md
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

@CLAassistant
Copy link

CLAassistant commented Feb 1, 2022

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Feb 1, 2022
@kgrover
Copy link
Contributor Author

kgrover commented Feb 1, 2022

Test results (unit tests, acceptance tests) screenshots as attachments.

Screen Shot 2022-02-01 at 3 33 10 PM

Screen Shot 2022-02-01 at 3 32 10 PM

{
"stream": {
"name": "customers",
"json_schema": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, all the json_schema fields in the integration test catalog can be replaced with an empty object. The full schema will populated automatically.

Suggested change
"json_schema": {
"json_schema": {},

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and removed the json_schema from the integration test catalog as you suggested

Comment on lines +10 to +12
Note that the Credits Ledger Entries must read all Customers for an incremental sync, but will only incrementally return new ledger entries for each customer.

Since the Orb API does not allow querying objects based on `updated_at`, these incremental syncs will capture updates to newly created objects but not resources updated after object creation.
Copy link
Contributor

@ChristopheDuong ChristopheDuong Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often do we expect "updates" on objects after creation?

How do we deal with those "rare" situations if that happens?

When running in incremental, should the connector (or should the user have a second connection) do a full refresh occasionally to verify that we are not missing such updates at a lower frequency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will depend on the resource:

I expect the Subscription and Customer resources will indeed change (e.g. the state of a subscription may become inactive, or a Customer may have a new shipping address) often. On the other hand, Credit Ledger Entries are immutable and I do not expect the Plan resource to change often either.

I'll go ahead and document that in this boostrap.md file!

As you said, I would expect there to be a second connection that occasionally runs to resync all data. The other option is for me to implement a "lookback" feature similar to the Stripe Connector (which features a lookback_window_days to always resync n days in the past). Do you recommend adding that parameter, or is it common to have two connections in this situation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably better to implement the lookback feature as you described directly in the connector's option.

So, this limitation is more obvious to the user when configuring this connector rather than expecting them to know and guess they need a second full refresh connection for such cases.

Comment on lines +57 to +58
"type": "object",
"properties": { "id": { "type": "string" } }
Copy link
Contributor

@ChristopheDuong ChristopheDuong Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Does this single field really need to be nested inside this stream?
  • Could it be possible to simplify it by a customer_id instead?
  • or do we expect to add more columns to the nested customer object? In that case, the customer object could be added later on with extra properties when that happens?

It would make it much easier / cleaner to manipulate at the destination if we could flatten it a bit at the source level (every nested object will be extracted in its own table in certain destinations)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ChristopheDuong that's a great point - if we want to keep just the customer_id here, would you recommend that we implement this with a schema transformer, or is there a different recommended solution? Let me know if there's an instructive example to look at for this sort of transform.

I believe it would be nice to maintain a customer_id on the Subscription resource rather than removing it because synced Subscriptions in a destination may be more useful with a customer_id field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChristopheDuong I attempted to look through existing usages of the custom normalizer, and I think it may be possible to do this with by inspecting the field subschema, and flattening the resource if the subschema contains an id field.

Is that the approach you'd recommend? I'm curious if there's a more standardized way to flatten responses as other APIs return nested resources as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChristopheDuong just following up here to see if the use of the custom transform to accomplish this sounds reasonable to you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I haven't answered right away as I'm not too familiar with the custom normalizer.

I guess it does sound a reasonable approach to me though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem - I'll give that a try!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgrover the schema transformer only coerces data types to match something declared in the catalog, so it wouldn't achieve the flattening you're after.

for an example: the google ads connector achieves this via custom code. See here. It's achieved in a bit of a complicated way: fields are declared in the schema JSON with dots in their name e.g: customer.details.first_name. The dots denote nesting in the response returned by the API. The connector then reads the field name in the schema, and knows how to recursively denest things from the API.

For now unfortunately your path forward is also via custom transformation.

I created a relevant issue here to address this need out of the box in the CDK.

"end_date": { "type": ["null", "string"], "format": "date-time" },
"plan": {
"type": "object",
"properties": { "id": { "type": "string" } }
Copy link
Contributor

@ChristopheDuong ChristopheDuong Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as with the customer field
there's a bunch more of these, I'll stop here haha

"type": ["array"],
"items": {
"type": "object",
"properties": { "id": { "type": "string" } }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, since it's an array, it seems fine i guess

Comment on lines 9 to 17
"properties": {
"api_key": {
"type": "string",
"title": "Orb API Key",
"description": "Orb API Key, issued from the Orb admin console.",
"airbyte_secret": true,
"order": 1
}
}
Copy link
Contributor

@ChristopheDuong ChristopheDuong Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an option to specify a "minimum" date to start sync from if someone doesn't wish to get all data from the beginning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that sounds reasonable; I can add that as a parameter to the connector to avoid syncing from the beginning.

"source_defined_primary_key": [["id"]]
},
{
"name": "credits_ledger_entries",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgrover would it be possible to include the usage event properties that we sent to Orb in the credit ledger entry output? This would give us more flexibility to analyze the usage data, e.g. being able to group the output on event properties, or use the event properties to join this data against other tables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmossman thanks for bringing this up - let me chat with the team and get back to you on this request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants