Skip to content

[source-pardot] - missing data for certain streams #58666

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task
crystalchuhl opened this issue Apr 25, 2025 · 2 comments
Open
1 task

[source-pardot] - missing data for certain streams #58666

crystalchuhl opened this issue Apr 25, 2025 · 2 comments

Comments

@crystalchuhl
Copy link

crystalchuhl commented Apr 25, 2025

Connector Name

source-pardot

Connector Version

v1.0.10

What step the error happened?

Other

Relevant information

I am seeing some missing data from streams visits, visitors, visitor_activities, visitor_page_views and list_membership. Anyone seeing the same thing? Thanks!

We are missing around 60% of data for all the the above streams. We can see new data coming through.

For streams visits, visitors, visitor_activities, visitor_page_views:
We can confirm there is missing data for the following reasons:

  1. We can see certain ids in visitor_page_view but not in the visits table.
  2. We can see the ids when syncing these tables from another tool.

Relevant log output

Contribute

  • Yes, I want to contribute
@crystalchuhl crystalchuhl added area/connectors Connector related issues needs-triage type/bug Something isn't working labels Apr 25, 2025
@marcosmarxm marcosmarxm changed the title [Pardot v1.0.10] - missing data for certain streams [source-pardot] - missing data for certain streams Apr 29, 2025
@justbeez
Copy link
Contributor

@crystalchuhl There's a bug right now with how this interacts with the Pardot v5 API's weird 100K limit for pagination sets which leads to each sync on some of these streams only pulling 100K records at a time. Usually this will normalize after multiple syncs (since it will pick up where it left off)—so a quick workaround is to schedule the sync to run every few minutes for just those incremental streams until it catches up, then re-enable whatever streams you actually need to go back to your normal schedule once it catches up.

The only stream that won't reach eventual consistency right now is prospect_accounts, because the v5 API doesn't support incremental syncs for that stream.

I've been helping a new contributor fix this bug as his first contribution, but I was talking with @marcosmarxm about trying to jump in and resolve it for the remaining streams this week if needed.

I have a bunch of large accounts I can test with, but I don't think I have any that are using visitor tracking . . . so I may need your help to check those when this gets merged :)

@justbeez
Copy link
Contributor

justbeez commented May 9, 2025

@marcosmarxm I've pushed a fix for this that I've been testing through Builder, which is available in PR #59758 .

For non-incremental streams like the current prospect_accounts, this just starts a new pagination set after the initial pagination limit (100K records) is hit. It does this by injecting the query value into the cursor, then parsing it back out.

The incremental streams are similar, but they also have to handle additional cases in the query parameters (since Pardot's API doesn't allow any other parameter to be present when the pagination key is).

Both cases are far from elegant, but I've synced several hundred million records testing it and haven't seen any issues yet.

@crystalchuhl It's worth noting that if you have very high traffic volume, you may see transient 504 errors if requesting many years of records from visitors, visits, visitor_activities, or visitor_page_views—this seems to be an issue of Pardot's server-side sorting of the result set. Eventually this may be able to be overcome be adding a Split Up Interval with smaller chunks, but I wasn't able to sort out how to manage that alongside the added complexity related to fixing the pagination (I'm probably missing something, but it was a BEAST to get this working with the limited parameters available in cursor and query parameters). You're welcome to give it a go though if you have time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants