Skip to content

Commit c7d853c

Browse files
bazarnovschlattk
authored andcommitted
🎉 Base-normalization: Implement normalization for MSSQL-destination (airbytehq#6079)
See the attached PR (airbytehq#6079)
1 parent 3469dfd commit c7d853c

File tree

131 files changed

+4246
-50
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

131 files changed

+4246
-50
lines changed

‎airbyte-integrations/bases/base-normalization/Dockerfile

+21-3
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,25 @@ USER root
44
WORKDIR /tmp
55
RUN apt-get update && apt-get install -y \
66
wget \
7+
curl \
78
unzip \
89
libaio-dev \
9-
libaio1
10+
libaio1 \
11+
gnupg \
12+
gnupg1 \
13+
gnupg2
14+
15+
# Install MS SQL Server dependencies
16+
RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
17+
RUN curl https://packages.microsoft.com/config/debian/10/prod.list > /etc/apt/sources.list.d/mssql-release.list
18+
RUN apt-get update && ACCEPT_EULA=Y apt-get install -y \
19+
libgssapi-krb5-2 \
20+
unixodbc-dev \
21+
msodbcsql17 \
22+
mssql-tools
23+
ENV PATH=$PATH:/opt/mssql-tools/bin
24+
25+
# Install Oracle dependencies
1026
RUN mkdir -p /opt/oracle
1127
RUN wget https://download.oracle.com/otn_software/linux/instantclient/19600/instantclient-basic-linux.x64-19.6.0.0.0dbru.zip
1228
RUN unzip instantclient-basic-linux.x64-19.6.0.0.0dbru.zip -d /opt/oracle
@@ -17,8 +33,8 @@ RUN pip install cx_Oracle
1733

1834
COPY --from=airbyte/base-airbyte-protocol-python:0.1.1 /airbyte /airbyte
1935

36+
# Install SSH Tunneling dependencies
2037
RUN apt-get update && apt-get install -y jq sshpass
21-
2238
WORKDIR /airbyte
2339
COPY entrypoint.sh .
2440
COPY build/sshtunneling.sh .
@@ -28,13 +44,15 @@ COPY normalization ./normalization
2844
COPY setup.py .
2945
COPY dbt-project-template/ ./dbt-template/
3046

47+
# Install python dependencies
3148
WORKDIR /airbyte/base_python_structs
3249
RUN pip install .
3350

3451
WORKDIR /airbyte/normalization_code
3552
RUN pip install .
3653
RUN pip install dbt-oracle==0.4.3
3754
RUN pip install git+https://github.com/dbeatty10/dbt-mysql@96655ea9f7fca7be90c9112ce8ffbb5aac1d3716#egg=dbt-mysql
55+
RUN pip install dbt-sqlserver==0.19.3
3856

3957

4058
WORKDIR /airbyte/normalization_code/dbt-template/
@@ -45,5 +63,5 @@ WORKDIR /airbyte
4563
ENV AIRBYTE_ENTRYPOINT "/airbyte/entrypoint.sh"
4664
ENTRYPOINT ["/airbyte/entrypoint.sh"]
4765

48-
LABEL io.airbyte.version=0.1.49
66+
LABEL io.airbyte.version=0.1.50
4967
LABEL io.airbyte.name=airbyte/normalization

‎airbyte-integrations/bases/base-normalization/README.md

+20
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,25 @@ Related documentation on normalization is available here:
99

1010
Below are short descriptions of the kind of tests that may be affected by changes to the normalization code.
1111

12+
### Build & Activate Virtual Environment and install dependencies
13+
From this connector directory, create a virtual environment:
14+
```
15+
python3 -m venv .venv
16+
```
17+
18+
This will generate a virtualenv for this module in `.venv/`. Make sure this venv is active in your
19+
development environment of choice. To activate it from the terminal, run:
20+
```
21+
source .venv/bin/activate
22+
pip install -r requirements.txt
23+
```
24+
If you are in an IDE, follow your IDE's instructions to activate the virtualenv.
25+
26+
Note that while we are installing dependencies from `requirements.txt`, you should only edit `setup.py` for your dependencies. `requirements.txt` is
27+
used for editable installs (`pip install -e`) to pull in Python dependencies from the monorepo and will call `setup.py`.
28+
If this is mumbo jumbo to you, don't worry about it, just put your deps in `setup.py` but install using `pip install -r requirements.txt` and everything
29+
should work as you expect.
30+
1231
## Unit Tests
1332

1433
Unit tests are automatically included when building the normalization project.
@@ -56,6 +75,7 @@ allowed characters, if quotes are needed or not, and the length limitations:
5675
- [snowflake](../../../docs/integrations/destinations/snowflake.md)
5776
- [mysql](../../../docs/integrations/destinations/mysql.md)
5877
- [oracle](../../../docs/integrations/destinations/oracle.md)
78+
- [mssql](../../../docs/integrations/destinations/mssql.md)
5979

6080
Rules about truncations, for example for both of these strings which are too long for the postgres 64 limit:
6181
- `Aaaa_Bbbb_Cccc_Dddd_Eeee_Ffff_Gggg_Hhhh_Iiii`

‎airbyte-integrations/bases/base-normalization/build.gradle

+1
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ task("customIntegrationTestPython", type: PythonTask, dependsOn: installTestReqs
4444
dependsOn ':airbyte-integrations:connectors:destination-redshift:airbyteDocker'
4545
dependsOn ':airbyte-integrations:connectors:destination-snowflake:airbyteDocker'
4646
dependsOn ':airbyte-integrations:connectors:destination-oracle:airbyteDocker'
47+
dependsOn ':airbyte-integrations:connectors:destination-mssql:airbyteDocker'
4748

4849
}
4950

‎airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/array.sql

+17
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
- Snowflake: flatten() -> https://docs.snowflake.com/en/sql-reference/functions/flatten.html
55
- Redshift: -> https://blog.getdbt.com/how-to-unnest-arrays-in-redshift/
66
- postgres: unnest() -> https://www.postgresqltutorial.com/postgresql-array/
7+
- MSSQL: openjson() –> https://docs.microsoft.com/en-us/sql/relational-databases/json/validate-query-and-change-json-data-with-built-in-functions-sql-server?view=sql-server-ver15
78
#}
89

910
{# cross_join_unnest ------------------------------------------------- #}
@@ -44,6 +45,17 @@
4445
cross join table(flatten({{ array_col }})) as {{ array_col }}
4546
{%- endmacro %}
4647

48+
{% macro sqlserver__cross_join_unnest(stream_name, array_col) -%}
49+
{# https://docs.microsoft.com/en-us/sql/relational-databases/json/convert-json-data-to-rows-and-columns-with-openjson-sql-server?view=sql-server-ver15#option-1---openjson-with-the-default-output #}
50+
CROSS APPLY (
51+
SELECT [value] = CASE
52+
WHEN [type] = 4 THEN (SELECT [value] FROM OPENJSON([value]))
53+
WHEN [type] = 5 THEN [value]
54+
END
55+
FROM OPENJSON({{ array_col }})
56+
) AS {{ array_col }}
57+
{%- endmacro %}
58+
4759
{# unnested_column_value -- this macro is related to unnest_cte #}
4860

4961
{% macro unnested_column_value(column_col) -%}
@@ -74,6 +86,11 @@
7486
{{ column_col }}
7587
{%- endmacro %}
7688

89+
{% macro sqlserver__unnested_column_value(column_col) -%}
90+
{# unnested array/sub_array will be located in `value` column afterwards, we need to address to it #}
91+
{{ column_col }}.value
92+
{%- endmacro %}
93+
7794
{# unnest_cte ------------------------------------------------- #}
7895

7996
{% macro unnest_cte(table_name, stream_name, column_col) -%}

‎airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/concat.sql

+10
Original file line numberDiff line numberDiff line change
@@ -14,3 +14,13 @@
1414
{% macro postgres__concat(fields) %}
1515
{{ dbt_utils.alternative_concat(fields) }}
1616
{% endmacro %}
17+
18+
{% macro sqlserver__concat(fields) -%}
19+
{#-- CONCAT() in SQL SERVER accepts from 2 to 254 arguments, we use batches for the main concat, to overcome the limit. --#}
20+
{% set concat_chunks = [] %}
21+
{% for chunk in fields|batch(253) -%}
22+
{% set _ = concat_chunks.append( "concat(" ~ chunk|join(', ') ~ ",'')" ) %}
23+
{% endfor %}
24+
25+
concat({{ concat_chunks|join(', ') }}, '')
26+
{%- endmacro %}

‎airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/datatypes.sql

+27-3
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,10 @@
2828
json
2929
{%- endmacro -%}
3030

31+
{%- macro sqlserver__type_json() -%}
32+
VARCHAR(max)
33+
{%- endmacro -%}
34+
3135

3236
{# string ------------------------------------------------- #}
3337

@@ -39,6 +43,10 @@
3943
varchar2(4000)
4044
{%- endmacro -%}
4145

46+
{% macro sqlserver__type_string() %}
47+
VARCHAR(max)
48+
{%- endmacro -%}
49+
4250

4351
{# float ------------------------------------------------- #}
4452
{% macro mysql__type_float() %}
@@ -69,17 +77,23 @@
6977
{% endmacro %}
7078

7179

72-
{# numeric ------------------------------------------------- #}
80+
{# numeric ------------------------------------------------- --#}
7381
{% macro mysql__type_numeric() %}
7482
float
7583
{% endmacro %}
7684

7785

78-
{# timestamp ------------------------------------------------- #}
86+
{# timestamp ------------------------------------------------- --#}
7987
{% macro mysql__type_timestamp() %}
8088
time
8189
{% endmacro %}
8290

91+
{%- macro sqlserver__type_timestamp() -%}
92+
{#-- in TSQL timestamp is really datetime --#}
93+
{#-- https://docs.microsoft.com/en-us/sql/t-sql/functions/date-and-time-data-types-and-functions-transact-sql?view=sql-server-ver15#DateandTimeDataTypes --#}
94+
datetime
95+
{%- endmacro -%}
96+
8397

8498
{# timestamp with time zone ------------------------------------------------- #}
8599

@@ -95,7 +109,7 @@
95109
timestamp
96110
{% endmacro %}
97111

98-
{# MySQL doesnt allow cast operation to work with TIMESTAMP so we have to use char #}
112+
{#-- MySQL doesnt allow cast operation to work with TIMESTAMP so we have to use char --#}
99113
{%- macro mysql__type_timestamp_with_timezone() -%}
100114
char
101115
{%- endmacro -%}
@@ -104,6 +118,12 @@
104118
varchar2(4000)
105119
{% endmacro %}
106120

121+
{%- macro sqlserver__type_timestamp_with_timezone() -%}
122+
{#-- in TSQL timestamp is really datetime or datetime2 --#}
123+
{#-- https://docs.microsoft.com/en-us/sql/t-sql/functions/date-and-time-data-types-and-functions-transact-sql?view=sql-server-ver15#DateandTimeDataTypes --#}
124+
datetime
125+
{%- endmacro -%}
126+
107127

108128
{# date ------------------------------------------------- #}
109129

@@ -118,3 +138,7 @@
118138
{% macro oracle__type_date() %}
119139
varchar2(4000)
120140
{% endmacro %}
141+
142+
{%- macro sqlserver__type_date() -%}
143+
date
144+
{%- endmacro -%}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{# converting hash in varchar _macro #}
2+
3+
{% macro sqlserver__hash(field) -%}
4+
convert(varchar(32), HashBytes('md5', coalesce(cast({{field}} as {{dbt_utils.type_string()}}), '')), 2)
5+
{%- endmacro %}

‎airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/json_operations.sql

+21
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,15 @@
5757
{{ "'\"" ~ str_list|join('"."') ~ "\"'" }}
5858
{%- endmacro %}
5959

60+
{% macro sqlserver__format_json_path(json_path_list) -%}
61+
{# -- '$."x"."y"."z"' #}
62+
{%- set str_list = [] -%}
63+
{%- for json_path in json_path_list -%}
64+
{%- if str_list.append(json_path.replace("'", "''").replace('"', '\\"')) -%} {%- endif -%}
65+
{%- endfor -%}
66+
{{ "'$.\"" ~ str_list|join(".") ~ "\"'" }}
67+
{%- endmacro %}
68+
6069
{# json_extract ------------------------------------------------- #}
6170

6271
{% macro json_extract(from_table, json_column, json_path_list, normalized_json_path) -%}
@@ -111,6 +120,10 @@
111120
{% endif -%}
112121
{%- endmacro %}
113122

123+
{% macro sqlserver__json_extract(from_table, json_column, json_path_list, normalized_json_path) -%}
124+
json_query({{ json_column }}, {{ format_json_path(json_path_list) }})
125+
{%- endmacro %}
126+
114127
{# json_extract_scalar ------------------------------------------------- #}
115128

116129
{% macro json_extract_scalar(json_column, json_path_list, normalized_json_path) -%}
@@ -145,6 +158,10 @@
145158
to_varchar(get_path(parse_json({{ json_column }}), {{ format_json_path(json_path_list) }}))
146159
{%- endmacro %}
147160

161+
{% macro sqlserver__json_extract_scalar(json_column, json_path_list, normalized_json_path) -%}
162+
json_value({{ json_column }}, {{ format_json_path(json_path_list) }})
163+
{%- endmacro %}
164+
148165
{# json_extract_array ------------------------------------------------- #}
149166

150167
{% macro json_extract_array(json_column, json_path_list, normalized_json_path) -%}
@@ -178,3 +195,7 @@
178195
{% macro snowflake__json_extract_array(json_column, json_path_list, normalized_json_path) -%}
179196
get_path(parse_json({{ json_column }}), {{ format_json_path(json_path_list) }})
180197
{%- endmacro %}
198+
199+
{% macro sqlserver__json_extract_array(json_column, json_path_list, normalized_json_path) -%}
200+
json_query({{ json_column }}, {{ format_json_path(json_path_list) }})
201+
{%- endmacro %}

‎airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/type_conversions.sql

+9
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,10 @@
2929
cast({{ array_column }} as varchar2(4000))
3030
{%- endmacro %}
3131

32+
{% macro sqlserver__array_to_string(array_column) -%}
33+
cast({{ array_column }} as {{dbt_utils.type_string()}})
34+
{%- endmacro %}
35+
3236
{# cast_to_boolean ------------------------------------------------- #}
3337
{% macro cast_to_boolean(field) -%}
3438
{{ adapter.dispatch('cast_to_boolean')(field) }}
@@ -47,3 +51,8 @@
4751
{% macro redshift__cast_to_boolean(field) -%}
4852
cast(decode({{ field }}, 'true', '1', 'false', '0')::integer as boolean)
4953
{%- endmacro %}
54+
55+
{# -- MS SQL Server does not support converting string directly to boolean, it must be casted as bit #}
56+
{% macro sqlserver__cast_to_boolean(field) -%}
57+
cast({{ field }} as bit)
58+
{%- endmacro %}

0 commit comments

Comments
 (0)