Skip to content

[source-revenuecat] Missing Schema and no pagination #54713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
tomdarmon-appstack opened this issue Feb 27, 2025 · 14 comments · Fixed by #55247
Closed
1 task done

[source-revenuecat] Missing Schema and no pagination #54713

tomdarmon-appstack opened this issue Feb 27, 2025 · 14 comments · Fixed by #55247
Assignees

Comments

@tomdarmon-appstack
Copy link

tomdarmon-appstack commented Feb 27, 2025

Connector Name

source-revenuecat

Connector Version

0.0.4

What step the error happened?

During the sync

Relevant information

Hey, I'm trying to use the RevenueCat connector. I think the implementation is pretty outdated, I've observed 2 big problems:

  • No schema on some streams leading to sync error when writing to destinations that require a schema (e.g. bigquery)
  • No pagination so we only fetch the first page, this make the connector... useless?

I've been working to fix those issues but I'm having trouble with the pagination. The pagination on RevenueCat works like this: https://www.revenuecat.com/docs/api-v2#tag/Pagination

My idea is to use Cursor Pagintion where I use the custom option to consturct the next page cursor such as:
Cursor Value = {{ last_record["id"] }}
Stop Condition = {{response['next_page'] is none}}

It correctly fetch some pages, but it seems like the stop condition is unstable, I don't retrieve the data and pages usually stop early. I tried to run a full sync and not just test it through the no code builder but the sync just never stops for some reasons.

My pagination yaml looks like this:

page_token_option:
  type: RequestOption
  inject_into: request_parameter
  field_name: starting_after
page_size_option:
  inject_into: request_parameter
  type: RequestOption
  field_name: limit
pagination_strategy:
  type: CursorPagination
  page_size: 10
  cursor_value: '{{ last_record["id"] }}'
  stop_condition: '{{response[''next_page''] is none}}'```

If can share credentials to help debug. Additionally, with the help of someone with a bit of experience on custom pagination I think I can make a working a release

Contribute

  • Yes, I want to contribute
@natikgadzhi
Copy link
Contributor

Uh oh.

Adding pagination should be straightforward.

  1. You'd want to "fork" the connector in Builder (find it in the list of souces in Airbyte Cloud or OSS, or go to Connector Builder → New → Fork existing)
  2. Add your credentials
  3. Add pagination
  4. When you press test button on each stream, it will also refresh schemas
  5. When you're done, press "Publish → Contribute to Airbyte" — this will prompt you to make a pull request.

@natikgadzhi
Copy link
Contributor

@tomdarmon-appstack let me know if you're down to try and do that in Builder. If not, I'll ask one of our community devs to fix this up.

@github-project-automation github-project-automation bot moved this to 📥 Triaging in Community Board Mar 5, 2025
@natikgadzhi natikgadzhi moved this from 📥 Triaging to 🐩 Grooming in Community Board Mar 5, 2025
@tomdarmon-appstack
Copy link
Author

Hey @natikgadzhi, Thanks for the tips.

I took a full day to try and do it. But the API is weird, I have a draft on my airbyte instance but it doesn't seem to work.

The request return a next_page key with the url to make the next request. But I was unable to leverage this, so I manually fetch the id of the last_record and fetch the data from there, but the API return duplicates when I do this (maybe an error on their side?).

If anyone takes this issue I can share my draft. When this become critical I might come back to give it a try at some point :(

@btkcodedev
Copy link
Collaborator

I'll take this up

@btkcodedev btkcodedev self-assigned this Mar 7, 2025
@btkcodedev btkcodedev moved this from 🐩 Grooming to 👀 In Review in Community Board Mar 7, 2025
@github-project-automation github-project-automation bot moved this from 👀 In Review to ✔️ Done in Community Board Mar 7, 2025
@natikgadzhi
Copy link
Contributor

Try it out, @btkcodedev shipped it ;-)

@sheinbergon
Copy link
Contributor

sheinbergon commented Apr 7, 2025

@natikgadzhi @btkcodedev so, that's still completely broken in terms of pagination (from what I could gather). Their API is indeed very weird. I think there's no other alternative other then writing a custom paginator who would extract the next cursor value properly. I'll take a shot at this in the upcoming weeks

@btkcodedev
Copy link
Collaborator

@sheinbergon
Pagination looked fine for the streams the last time I checked. Could you let me know which specific streams you're having trouble with?

@sheinbergon
Copy link
Contributor

All of them. I'm expecting hundreds of thousands of entries and getting merely a few hundreds. Are you saying you're seeing full parity with the data displayed in the RC dashboards?

@tomdarmon-appstack
Copy link
Author

Yep, last time I checked the changes were not enough, I was still missing a lot of data.

I think this issue was closed a bit fast for this connector :(

@btkcodedev
Copy link
Collaborator

btkcodedev commented Apr 11, 2025

Sure @sheinbergon,if you have custom logic for pagination, please do.
I'll check on my side, Thanks!

@sheinbergon
Copy link
Contributor

@btkcodedev @tomdarmon-appstack ok, I managed to fix the connector avoiding the need to migrate to low-code.
I've also add the retry strategies. Currently testing locally and I'm going to open a PR once everything is running smoothly

@tomdarmon-appstack
Copy link
Author

Thanks a lot! If you want me to try it before merging the PR, I can add the manifest and do a try on my data if needed :)

@sheinbergon
Copy link
Contributor

sheinbergon commented Apr 22, 2025

So it's working. However, this connector is pretty useless, and not because of Airbyte.

RC's API don't allow for true incremental syncs. You are expected to "sync everything" everytime you run. For live applications, this will amount to hundreds of millions of records. In fact, the manifest erroneously marks some streams as incremental, even though a decending return order is not guaranteed.

While this scale may work well for batch apis (like the customers stream). If you need to start mapping subscriptions, (meaning issuing an api call per customer), it really stops making sense. It'd make more sense to manually work with their exported daily dumps

I'm uploading the fixed manifest here. If you'd like to go ahead and merge it, feel free to do so

Good luck

revenue_cat.yaml.txt

@tomdarmon-appstack
Copy link
Author

tomdarmon-appstack commented Apr 22, 2025

Ok, it looks like you put a lot of effort into this.

Thanks a lot, I think I will avoid using this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants