Skip to content

Source Acceptance Tests: invalid test case test_sequential_reads #13148

Closed
@davydov-d

Description

@davydov-d
  • First of all, the test is built on a naive assumption that the results of two consecutive full_refresh reads must be strictly equal, or at least, one should be the subset of another one (records are compared by primary key if it is present, or by record hash with an option to ignore some fields). That is incorrect, since we have no guarantee no records will be removed.
  • Second, when trying to figure out if one set of records is the subset of another one, the absence of symmetric_difference is used. That is also incorrect, since it indicates that sets are completely equal. Instead, subset should be used.
  • Third, when trying to compare records by the primary key, the primary key of a configured stream defined in a configured_catalog.json file is often missing and it's not validated or highlighted in any possible way since the field is optional. This leads to comparing records by hash instead of by primary keys. My suggestion here is to log the warning in case ConfiguredStream.Stream.source_defined_primary_key is not falsy but ConfiguredStream.primary_key is.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions