Skip to content

Source MSSQL: datetime fields with millisecond precision are being rounded down to the second, causing rows to keep incrementally syncing when they shouldn't and replicating incorrect precision to destination #10262

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sparkysean opened this issue Feb 11, 2022 · 1 comment
Labels
area/connectors Connector related issues area/databases community connectors/destination/mssql frozen Not being actively worked on team/db-dw-sources Backlog for Database and Data Warehouse Sources team type/bug Something isn't working

Comments

@sparkysean
Copy link

Environment

  • Airbyte version: 0.35.27-alpha
  • OS Version / Instance: AWS EC2
  • Deployment: Docker
  • Source Connector and version: Microsoft SQL Server (MSSQL) v0.3.14
  • Destination Connector and version: Snowflake v0.4.8
  • Severity: Medium-High
  • Step where error happened: Sync job

Current Behaviour

Initial load completes as expected (all rows replicated). However, running a subsequent load (with the incremental + append option, using ModifiedDate (datetime) as a cursor field), all records are loaded again even though there are no changes in the source MSSQL database cursor field.

In the source database (a standard AdventureWorks SQL Server demo database), all the ModifiedDate values in the Sales.Customer table are set to "2014-09-12 11:15:07.263". When syncing this data, Airbyte saves the cursor value rounded down to "2014-09-12 11:15:07Z", stripping out the millisecond component. The next time a sync runs, of course 2014-09-12 11:15:07.263 is technically greater than 2014-09-12 11:15:07Z, so all records are synced over again when they shouldn't be.

Interestingly, the record payload itself within Snowflake is also rounded, meaning that both the cursor and the replicated data have less precision than the source database:

SQL Server:
image

Snowflake:
image

Expected Behaviour

No changes to the cursor value in the source should yield no records emitted; replicated columns should contain the same precision as the source

Logs

logs-4.txt
logs-5.txt

I've also tried to point out useful bits below with line numbers

Log highlights Initial sync (logs-4.txt), the cursor is null - therefore replicate everything (expected)

[L828]: 2022-02-10 07:03:38 �[source] > 2022-02-10 07:03:38 �[INFO] i.a.i.s.r.StateDecoratingIterator(computeNext):60 - State Report: stream name: AirbyteStreamNameNamespacePair{name='Customer', namespace='Sales'}, original cursor field: null, original cursor null, cursor field: ModifiedDate, new cursor: 2014-09-12T11:15:07Z

However, in the incremental run (logs-5.txt), shenanigans ensue:


[L808]: 2022-02-10 07:04:39 �[source]�> 2022-02-10 07:04:39 �[INFO]�i.a.i.s.m.MssqlSource(lambda$queryTableIncremental$0):111 - Prepared SQL query for queryTableIncremental is: SELECT "CustomerID","PersonID","StoreID","TerritoryID","AccountNumber","rowguid","ModifiedDate" FROM "Sales"."Customer" WHERE "ModifiedDate" > ?

[L830]: 2022-02-10 07:04:44 �[source]> 2022-02-10 07:04:43 [INFO] i.a.i.s.r.StateDecoratingIterator(computeNext):60 - State Report: stream name: AirbyteStreamNameNamespacePair{name='Customer', namespace='Sales'}, original cursor field: ModifiedDate, original cursor 2014-09-12T11:15:07Z, cursor field: ModifiedDate, new cursor: 2014-09-12T11:15:07Z

Steps to Reproduce

  1. Create SQL Server source (e.g. a basic AdventureWorks RDS database in my case)
  2. Create a Snowflake destination
  3. Create a connection between the two using incremental method, replicate Sales.Customer table, using cursor column ModifiedDate (datetime)
  4. Run initial sync
  5. Run incremental sync

Are you willing to submit a PR?

Willing, but perhaps not all that able! :) I'm not super familiar with the Airbyte codebase, so it may be more efficient for someone else to look into it who knows it better.

This issue looks broadly pretty similar to existing issues #8904 and #9915, but wanted to call out that MSSQL is also affected.

@sparkysean sparkysean added needs-triage type/bug Something isn't working labels Feb 11, 2022
@bleonard bleonard added autoteam team/tse Technical Support Engineers labels Apr 26, 2022
@grishick grishick added the team/db-dw-sources Backlog for Database and Data Warehouse Sources team label Sep 27, 2022
@marcosmarxm marcosmarxm changed the title MSSQL datetime fields with millisecond precision are being rounded down to the second, causing rows to keep incrementally syncing when they shouldn't and replicating incorrect precision to destination Source MSSQL: datetime fields with millisecond precision are being rounded down to the second, causing rows to keep incrementally syncing when they shouldn't and replicating incorrect precision to destination Nov 30, 2022
@bleonard bleonard added the frozen Not being actively worked on label Mar 22, 2024
@octavia-squidington-iii
Copy link
Collaborator

At Airbyte, we seek to be clear about the project priorities and roadmap. This issue has not had any activity for 365 days, suggesting that it's not as critical as others. It's possible it has already been fixed. It is being marked as stale and will be closed in 20 days if there is no activity. To keep it open, please comment to let us know why it is important to you and if it is still reproducible on recent versions of Airbyte.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/databases community connectors/destination/mssql frozen Not being actively worked on team/db-dw-sources Backlog for Database and Data Warehouse Sources team type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants