-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Amplitude source has errors- " Failed to apply RFC3339 pattern on .." for "Events" stream with "Incremental | Append" sync mode #13057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Since you're using the Kafka or S3 destination, there is no support of About the data is being exposed:
This could be related, haven't faced this issue yet. Could you please share the complete sync log for examination? Thanks. |
That makes sense for that mode. But correct me if I am wrong, in the incremental sync, append mode - only the new or updated records are appended based on I believe in
We see this in the kafka topic which was configured as the destination. We had configured a replication period of 5 minutes, so we saw duplicated records (which was not we expected given our understanding of
We have lost the logs since we were prototyping locally. I will post one I am able to reproduce this with kafka destination specifically. |
Correct, the connector should return at least 1 record message, using
In this case, you should consider to remove the duplicates using any other way available to you. My local suggestion here is to use Postgres DB on Remote cluster or on premise.
Exactly, I'm not aware of how the Kafka connector works at this very moment, but i think there is no way to avoid duplicate records, because of missing I'll investigate the |
Awesome! thank you! Please do keep this thread posted as and when it happens. Coming back to
I think I am still failing to understand the "Incremental Append" mode and fetching only the "updated" records from last sync. Consider this example:
since this is first sync, this is essentially a refresh, so all the data is logged in the destination (does not matter which destination). In the next sync, say that gets kicked off at 21:00, the data connector will try to fetch is from 15:00 - 21:00, say we get:
Given the definition of The behavior we see right now is that, all the records in this batch are emitted. the problem is not about de-duplication, it's about redundant records being sent which by definition of |
Yes, you're right, there is the offset for 6 hours, from the |
@mohitreddy1996 Regards the time offset and duplicates:
The fix: #13074 Please be aware, after #13074 is merged, this issue will be closed automatically. |
@bazarnov thank you so much! Will definitely try and let you know if things work for us :) |
Environment
Current Behavior
Ingesting data from Amplitude has logs with entries:
I do see the data getting exported:
Setting up the connection with sync mode as
Incremental | Append
does not seem to work, we do see duplicated entries in the destination.Could it be because
event_time
which is the cursor for AmplitudeEvents
stream is not RFC3339 pattern but "UTC ISO-8601 formatted timestamp" - https://developers.amplitude.com/docs/export-apiExpected Behavior
event_time
has fine-enough granularity that syncs which are few hours apart should not have duplicated entriesLogs
Output pasted above
Steps to Reproduce
Are you willing to submit a PR?
I can take a look into the error, but I am not sure where the error is surfacing from right now. With some guidance, I am more than happy to contribute :)
The text was updated successfully, but these errors were encountered: