Closed
Description
Environment
- Airbyte version: 0.38.4-alpha
- OS Version / Instance: Ubuntu 20.04 docker image
- Deployment: EKS, K8S stable deployment
- Source Connector and version: Amplitude, BETA
- Destination Connector and version: Not relevant (see this issue for any destination), using kafka
- Severity: High
- Step where error happened: Setup new connection
Current Behavior
Ingesting data from Amplitude has logs with entries:
2022-05-20 14:24:26 replication-orchestrator > Failed to apply RFC3339 pattern on 2022-05-16 19:13:09.369000
2022-05-20 14:24:26 replication-orchestrator > Failed to apply RFC3339 pattern on 2022-05-18 02:18:54.351000
2022-05-20 14:24:26 replication-orchestrator > Failed to apply RFC3339 pattern on 2022-05-18 02:18:54.349000
2022-05-20 14:24:26 replication-orchestrator > Failed to apply RFC3339 pattern on 2022-05-18 02:18:57.113881
2022-05-20 14:24:26 replication-orchestrator > Failed to apply RFC3339 pattern on 2022-05-18 02:18:54.349000
2022-05-20 14:24:26 replication-orchestrator > Failed to apply RFC3339 pattern on 2022-05-18 02:18:54.349000
2022-05-20 14:24:26 replication-orchestrator > Failed to apply RFC3339 pattern on 2022-05-18 02:18:54.349000
I do see the data getting exported:
{
"_airbyte_ab_id": "xxxx",
"_airbyte_stream": "events",
"_airbyte_emitted_at": 1652788324908,
"_airbyte_data": {
"$insert_id": "xxxx",
"$insert_key": "xxxx",
"$schema": xxxx,
"adid": null,
"amplitude_attribution_ids": null,
"amplitude_event_type": null,
"amplitude_id": xxxx,
"app": xxxx,
"city": null,
"client_event_time": "2022-05-16 19:13:09.369000",
"client_upload_time": "2022-05-16 19:13:09.369000",
"country": "United States",
"data": {
"group_first_event": {
},
"group_ids": {
}
},
"data_type": "event",
"device_brand": null,
"device_carrier": null,
"device_family": null,
"device_id": "xxxxx",
"device_manufacturer": null,
"device_model": null,
"device_type": null,
"dma": null,
"event_id": 929601028,
"event_properties": {
},
"event_time": "2022-05-16 19:13:09.369000",
"event_type": "watch_tutorial",
"global_user_properties": {
},
"group_properties": {
},
"groups": {
},
"idfa": null,
"ip_address": "127.0.0.1",
"is_attribution_event": false,
"language": null,
"library": "http/2.0",
"location_lat": null,
"location_lng": null,
"os_name": null,
"os_version": null,
"partner_id": null,
"paying": null,
"plan": {
},
"platform": null,
"processed_time": "2022-05-16 19:13:12.299914",
"region": null,
"sample_rate": null,
"server_received_time": "2022-05-16 19:13:09.369000",
"server_upload_time": "2022-05-16 19:13:09.375000",
"session_id": -1,
"start_version": null,
"user_creation_time": "2022-05-16 19:13:09.369000",
"user_id": "[email protected]",
"user_properties": {
"Cohort": "Test A"
},
"uuid": "3aa93d06-d54c-11ec-9cec-b7b7bb95f4cc",
"version_name": null
}
}
Setting up the connection with sync mode as Incremental | Append
does not seem to work, we do see duplicated entries in the destination.
Could it be because event_time
which is the cursor for Amplitude Events
stream is not RFC3339 pattern but "UTC ISO-8601 formatted timestamp" - https://developers.amplitude.com/docs/export-api
Expected Behavior
- No errors in the logs
- Incremental | Append should not have duplicated entries given
event_time
has fine-enough granularity that syncs which are few hours apart should not have duplicated entries
Logs
Output pasted above
Steps to Reproduce
- Setup Amplitude source
- Setup any destination (Kafka or S3 preferable - I tested on both)
- Replicate "events" stream with "Incremental | Append" sync mode
Are you willing to submit a PR?
I can take a look into the error, but I am not sure where the error is surfacing from right now. With some guidance, I am more than happy to contribute :)