Skip to content

Amplitude source has errors- " Failed to apply RFC3339 pattern on .." for "Events" stream with "Incremental | Append" sync mode #13057

Closed
@mohitreddy1996

Description

@mohitreddy1996

Environment

  • Airbyte version: 0.38.4-alpha
  • OS Version / Instance: Ubuntu 20.04 docker image
  • Deployment: EKS, K8S stable deployment
  • Source Connector and version: Amplitude, BETA
  • Destination Connector and version: Not relevant (see this issue for any destination), using kafka
  • Severity: High
  • Step where error happened: Setup new connection

Current Behavior

Ingesting data from Amplitude has logs with entries:

2022-05-20 14:24:26 replication-orchestrator > Failed to apply RFC3339 pattern on 2022-05-16 19:13:09.369000
2022-05-20 14:24:26 replication-orchestrator > Failed to apply RFC3339 pattern on 2022-05-18 02:18:54.351000
2022-05-20 14:24:26 replication-orchestrator > Failed to apply RFC3339 pattern on 2022-05-18 02:18:54.349000
2022-05-20 14:24:26 replication-orchestrator > Failed to apply RFC3339 pattern on 2022-05-18 02:18:57.113881
2022-05-20 14:24:26 replication-orchestrator > Failed to apply RFC3339 pattern on 2022-05-18 02:18:54.349000
2022-05-20 14:24:26 replication-orchestrator > Failed to apply RFC3339 pattern on 2022-05-18 02:18:54.349000
2022-05-20 14:24:26 replication-orchestrator > Failed to apply RFC3339 pattern on 2022-05-18 02:18:54.349000

I do see the data getting exported:

{
  "_airbyte_ab_id": "xxxx",
  "_airbyte_stream": "events",
  "_airbyte_emitted_at": 1652788324908,
  "_airbyte_data": {
    "$insert_id": "xxxx",
    "$insert_key": "xxxx",
    "$schema": xxxx,
    "adid": null,
    "amplitude_attribution_ids": null,
    "amplitude_event_type": null,
    "amplitude_id": xxxx,
    "app": xxxx,
    "city": null,
    "client_event_time": "2022-05-16 19:13:09.369000",
    "client_upload_time": "2022-05-16 19:13:09.369000",
    "country": "United States",
    "data": {
      "group_first_event": {

      },
      "group_ids": {

      }
    },
    "data_type": "event",
    "device_brand": null,
    "device_carrier": null,
    "device_family": null,
    "device_id": "xxxxx",
    "device_manufacturer": null,
    "device_model": null,
    "device_type": null,
    "dma": null,
    "event_id": 929601028,
    "event_properties": {

    },
    "event_time": "2022-05-16 19:13:09.369000",
    "event_type": "watch_tutorial",
    "global_user_properties": {

    },
    "group_properties": {

    },
    "groups": {

    },
    "idfa": null,
    "ip_address": "127.0.0.1",
    "is_attribution_event": false,
    "language": null,
    "library": "http/2.0",
    "location_lat": null,
    "location_lng": null,
    "os_name": null,
    "os_version": null,
    "partner_id": null,
    "paying": null,
    "plan": {

    },
    "platform": null,
    "processed_time": "2022-05-16 19:13:12.299914",
    "region": null,
    "sample_rate": null,
    "server_received_time": "2022-05-16 19:13:09.369000",
    "server_upload_time": "2022-05-16 19:13:09.375000",
    "session_id": -1,
    "start_version": null,
    "user_creation_time": "2022-05-16 19:13:09.369000",
    "user_id": "[email protected]",
    "user_properties": {
      "Cohort": "Test A"
    },
    "uuid": "3aa93d06-d54c-11ec-9cec-b7b7bb95f4cc",
    "version_name": null
  }
}

Setting up the connection with sync mode as Incremental | Append does not seem to work, we do see duplicated entries in the destination.

Could it be because event_time which is the cursor for Amplitude Events stream is not RFC3339 pattern but "UTC ISO-8601 formatted timestamp" - https://developers.amplitude.com/docs/export-api

Expected Behavior

  • No errors in the logs
  • Incremental | Append should not have duplicated entries given event_time has fine-enough granularity that syncs which are few hours apart should not have duplicated entries

Logs

Output pasted above

Steps to Reproduce

  1. Setup Amplitude source
  2. Setup any destination (Kafka or S3 preferable - I tested on both)
  3. Replicate "events" stream with "Incremental | Append" sync mode

Are you willing to submit a PR?

I can take a look into the error, but I am not sure where the error is surfacing from right now. With some guidance, I am more than happy to contribute :)

Metadata

Metadata

Assignees

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions