Skip to content

[source-mssql] DatetimeOffset datatype not properly managed to AVRO/PARQUET with logical type #44858

Open
@kev-datams

Description

@kev-datams

Connector Name

source-mssql

Connector Version

4.1.3

What step the error happened?

Other

Relevant information

Hi team ✋

We are using the MSSQL source connector to retrieve data from DatetimeOffset datatype fields (marked as Timestamp with Timezone in Airbyte connector configuration UI) to AVRO files stored on GCS. A BigQuery external table, with logical_types inference enabled, references the files to load them via SQL queries using dbt.
image

For instance for the BookingDate_OriginalOffset field, the AVRO schema is:

{
    "name":"BookingDate_OriginalOffset",
    "type": [
         "null",
         {"type":"long","logicalType":"timestamp-micros"}
         ,"string"
     ],
    "default":null
}

🔴 We upgraded the MSSQL source connector to the v4.1.3 end of July, and observed that DatetimeOffset fields data was now stored into long_value instead of string_value in the AVRO files.
Nothing bad to retrieve the data incrementally and read it via external table in BigQuery until this week, where our SQL query reading the AVRO files via external table failed with below error:

Error while reading table: xxx, error message: Invalid timestamp value '-62135769600000000' for field 'long_timestamp_micros_value' of type 'long' File: bigstore/xxx.avro

ℹ️ We identified that DatetimeOffset field value in MSSQL database is 0001-01-01 00:00:00.0000000 +00:00, leading to the -62135769600000000 long value, instead of 0001-01-01 00:00:00.0000000 +00:00 string value before the connector upgrade.
image

The only workaround on our side is to disable the logical_types inference on our external table in order to manually parse and manage long values, including not "timestampable" ones like -62135769600000000.

🙏 Please could you have a look at this issue in order to continue storing the original DatetimeOffset field value as string instead of long if logical inference is not possible for extreme values ?

NB:

  • reading the AVRO file using Python library also produced an error at inference time for this value
  • as Parquet uses AVRO at first step, I suppose the issue is also present for PARQUET

How to reproduce the issue

  1. Ensure using mssql-source connector version > 4.1.3
  2. Create a MSSQL table with DatetimeOffset field with some rows, including one with value 0001-01-01 00:00:00.0000000 +00:00
  3. Trigger the sync to any destination using AVRO like GCS
  4. Read the resulting AVRO file using logical types inference => KO
  5. Read the resulting AVRO file without using logical types inference => OK

Possibly PR causing the issue

Relevant log output

No response

Contribute

  • Yes, I want to contribute

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions