Description
Connector Name
source-mssql
Connector Version
4.1.3
What step the error happened?
Other
Relevant information
Hi team ✋
We are using the MSSQL source connector to retrieve data from DatetimeOffset
datatype fields (marked as Timestamp with Timezone
in Airbyte connector configuration UI) to AVRO files stored on GCS. A BigQuery external table, with logical_types inference enabled, references the files to load them via SQL queries using dbt.
For instance for the BookingDate_OriginalOffset
field, the AVRO schema is:
{
"name":"BookingDate_OriginalOffset",
"type": [
"null",
{"type":"long","logicalType":"timestamp-micros"}
,"string"
],
"default":null
}
🔴 We upgraded the MSSQL source connector to the v4.1.3
end of July, and observed that DatetimeOffset
fields data was now stored into long_value
instead of string_value
in the AVRO files.
Nothing bad to retrieve the data incrementally and read it via external table in BigQuery until this week, where our SQL query reading the AVRO files via external table failed with below error:
Error while reading table: xxx, error message: Invalid timestamp value '-62135769600000000' for field 'long_timestamp_micros_value' of type 'long' File: bigstore/xxx.avro
ℹ️ We identified that DatetimeOffset
field value in MSSQL database is 0001-01-01 00:00:00.0000000 +00:00
, leading to the -62135769600000000
long value, instead of 0001-01-01 00:00:00.0000000 +00:00
string value before the connector upgrade.
The only workaround on our side is to disable the logical_types inference on our external table in order to manually parse and manage long values, including not "timestampable" ones like -62135769600000000
.
🙏 Please could you have a look at this issue in order to continue storing the original DatetimeOffset
field value as string instead of long if logical inference is not possible for extreme values ?
NB:
- reading the AVRO file using Python library also produced an error at inference time for this value
- as Parquet uses AVRO at first step, I suppose the issue is also present for PARQUET
How to reproduce the issue
- Ensure using mssql-source connector version > 4.1.3
- Create a MSSQL table with
DatetimeOffset
field with some rows, including one with value0001-01-01 00:00:00.0000000 +00:00
- Trigger the sync to any destination using AVRO like GCS
- Read the resulting AVRO file using logical types inference => KO
- Read the resulting AVRO file without using logical types inference => OK
Possibly PR causing the issue
- Remove custom debezium mssql converted for datetimeoffset type #39342
- [source-mssql] correctly parse type datetimeoffset #39419
Relevant log output
No response
Contribute
- Yes, I want to contribute