Destination Snowflake: _AIRBYTE_UNIQUE_KEY dependent on order of composite primary key fields - can cause duplicates #21330
Labels
area/connectors
Connector related issues
community
connectors/destination/snowflake
connectors/destinations-warehouse
team/destinations
Destinations team's backlog
type/bug
Something isn't working
Environment
Current Behavior
Airbyte normalization dedupes data using an
_AIRBYTE_UNIQUE_KEY
, which is the MD5 hash of the values of the primary key(s) for the row; however, if the order in which these field names is changed in the config/catalog (only applicable to composite/multiple primary keys), the MD5 hash value (_AIRBYTE_UNIQUE_KEY
) for a given row will also change, resulting in duplicate row values (only applicable when deduping data).Expected Behavior
Deduplication should not be impacted by the order in which composite primary key field names are stored. Ideally a non-order dependent algorithm could be used, but this would be a breaking change in the future. More realistically, something that ensures a given sort order in the normalization query regardless of config/catalog order would solve this.
Steps to Reproduce
The text was updated successfully, but these errors were encountered: