Skip to content

Destination ElasticSearch: add support incremental sync “overwrite” mode #17594

Open
@marc-marketparts

Description

@marc-marketparts

Tell us about the problem you're trying to solve

We have to index millions of records from Snowflake into an Elasticsearch index.
Our expectation is that an update of a record in Snowflake will update the corresponding document in the Elasticsearch index.

This behaviour is currently available in the Elasticsearch destination connector for the fullrefresh-overwrite mode only (if "UPSERT" mode has been activated in the connector settings, the table primary key is used as the document id).

We cannot afford to always do a full refresh of the index as it takes too much time (for our business use case) due to the volumetry.
We need to update the index incrementally, but the current types of Airbyte incremental sync are restricted to Append( and Deduped for some connectors), which will produce new documents in the Elasticsearch index, instead of updating the corresponding ones.

The new sync mode “Incremental - overwrite” will handle this use case (insert/update existing records in the destination).

When this mode will be available, it will be very easy to implement it in the Elasticsearch destination connector, as it only requires to pass the primary key as the document id (which is already done for the fullrefresh mode).

Describe the solution you’d like

Allow sync mode incremental "overwrite" for developers, and enable the choice of the primary key in the UI.

N.B/: as written is doc, the mode Overwrite: Overwrite by first deleting existing data in the destination. , this would not always be the case for incremental mode (it can be an update/insert instead of a delete/insert handled by the destination) and the new mode will not sync any deletion in the source as it is incremental.
So the definition of "overwrite" have to be updated or a new name have to be found (e.g. "merge"). My flavour to the former.

Describe the alternative you’ve considered or used

  • Partition our records and use multiple Airbyte streams to sync them in multiple indices.
  • Implement an external custom script to index our records.

Additional context

Are you willing to submit a PR?

No

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions