Skip to content

Commit 8696b12

Browse files
committed
dbt: Refactor content between index vs. usage pages, plus copy-editing
1 parent 7a49f6f commit 8696b12

File tree

2 files changed

+153
-100
lines changed

2 files changed

+153
-100
lines changed

docs/integrate/dbt/index.md

Lines changed: 55 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,10 @@
11
(dbt)=
2-
32
# dbt
43

4+
:::{include} /_include/links.md
5+
:::
6+
7+
## About
58
```{div}
69
:style: "float: right"
710
[![](https://www.getdbt.com/ui/img/logos/dbt-logo.svg){w=180px}](https://www.getdbt.com/)
@@ -57,13 +60,31 @@ scale.
5760
:::
5861

5962

63+
### dbt's Features
64+
The data abstraction layer provided by [dbt-core] allows the decoupling of
65+
the models on which reports and dashboards rely from the source data. When
66+
business rules or source systems change, you can still maintain the same models
67+
as a stable interface.
68+
69+
Some of the things that dbt can do include:
70+
71+
* Import reference data from CSV files.
72+
* Track changes in source data with different strategies so that downstream
73+
models do not need to be built every time from scratch.
74+
* Run tests on data, to confirm assumptions remain valid, and to validate
75+
any changes made to the models' logic.
76+
77+
### CrateDB's Benefits
78+
Due to its unique capabilities, CrateDB is an excellent warehouse choice for
79+
data transformation projects. It offers automatic indexing, fast aggregations,
80+
easy partitioning, and the ability to scale horizontally.
81+
82+
6083
## Setup
6184
Install the most recent version of the [dbt-cratedb2] Python package.
6285
```shell
6386
pip install --upgrade 'dbt-cratedb2'
6487
```
65-
dbt-cratedb2 is based on dbt-postgres, which uses [psycopg2] to connect to
66-
the database server.
6788

6889

6990
## Configure
@@ -91,26 +112,49 @@ cratedb_analytics:
91112
92113
## Learn
93114
115+
Learn how to use CrateDB with dbt by exploring concise examples.
116+
94117
:::{rubric} Tutorials
95118
:::
96119
97-
:::::{grid}
98-
::::{grid-item-card}
120+
::::{grid} 2
121+
:gutter: 5
122+
123+
:::{grid-item-card}
99124
:link: dbt-usage
100125
:link-type: ref
101-
Advanced configuration options and other usage guidelines.
126+
:link-alt: dbt usage guidelines
127+
:padding: 3
128+
:class-card: sd-text-center sd-pt-4
129+
:class-header: sd-fs-4
130+
{material-outlined}`integration_instructions;2.5em`
131+
Usage Guidelines
132+
^^^
102133
```{toctree}
103134
:maxdepth: 2
135+
:hidden:
104136
105137
usage
106138
```
107-
::::
108-
::::{grid-item-card}
139+
+++
140+
Usage guidelines, notes, and advanced configuration options.
141+
:::
142+
143+
:::{grid-item-card}
109144
:link: https://github.com/crate/cratedb-examples/tree/main/framework/dbt/
110145
:link-type: url
111-
A few dbt example projects using CrateDB.
146+
:link-alt: dbt CrateDB Examples
147+
:padding: 3
148+
:class-card: sd-text-center sd-pt-4
149+
:class-header: sd-fs-4
150+
{material-outlined}`apps;2.5em`
151+
Example Projects
152+
^^^
153+
+++
154+
Explore a few dbt example projects using CrateDB.
155+
:::
156+
112157
::::
113-
:::::
114158

115159

116160
:::{rubric} Webinars
@@ -142,12 +186,9 @@ and then publish your project to a GitHub repository.
142186
::::
143187

144188

145-
146-
[custom schemas with dbt]: https://docs.getdbt.com/docs/build/custom-schemas
147189
[dbt]: https://www.getdbt.com/
190+
[dbt-core]: https://github.com/dbt-labs/dbt-core
148191
[dbt-cratedb2]: https://pypi.org/project/dbt-cratedb2/
149192
[dbt Cloud]: https://www.getdbt.com/product/dbt-cloud/
150193
[dbt Postgres Setup]: https://docs.getdbt.com/docs/core/connect-data-platform/postgres-setup
151-
[Using dbt with CrateDB]: https://community.cratedb.com/t/using-dbt-with-cratedb/1566
152-
[psycopg2]: https://pypi.org/project/psycopg2/
153194
[`profiles.yml`]: https://docs.getdbt.com/docs/core/connect-data-platform/profiles.yml

docs/integrate/dbt/usage.md

Lines changed: 98 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,18 @@
11
(dbt-usage)=
2-
32
# Using dbt with CrateDB
43

5-
_Guidelines for transforming data using dbt and CrateDB._
6-
7-
## Introduction
8-
9-
### dbt's Features
10-
The data abstraction layer provided by [dbt][dbt-core] allows the decoupling of
11-
the models on which reports and dashboards rely from the source data. When
12-
business rules or source systems change, you can still maintain the same models
13-
as a stable interface.
14-
15-
Some of the things that dbt can do include:
16-
17-
* Import reference data from CSV files
18-
* Track changes in source data with different strategies so that downstream
19-
models do not need to be built every time from scratch.
20-
* Run tests on data, to confirm assumptions remain valid, and to validate
21-
any changes made to the models' logic.
4+
:::{include} /_include/links.md
5+
:::
226

23-
### CrateDB's Benefits
24-
Due to its unique capabilities, CrateDB is an excellent warehouse choice for
25-
data transformation projects. It offers automatic indexing, fast aggregations,
26-
easy partitioning, and the ability to scale horizontally.
27-
28-
29-
## Setup
7+
_Setup instructions and guidelines for transforming data using dbt and CrateDB._
308

9+
:::{div}
3110
For running the following steps, you will need connectivity to a CrateDB
32-
cluster, and a Python installation on your workstation. The starting point
33-
will be a fresh installation of `dbt-cratedb2`.
11+
cluster, and a Python installation on your workstation. You can use
12+
[CrateDB Self-Managed] or [CrateDB Cloud].
13+
:::
3414

35-
```bash
36-
pip install --upgrade 'dbt-cratedb2'
37-
```
15+
## Setup
3816

3917
To start a CrateDB instance for evaluation purposes, use Docker or Podman.
4018
```shell
@@ -43,12 +21,18 @@ docker run --rm \
4321
--env=CRATE_HEAP_SIZE=2g crate:latest
4422
```
4523

46-
**dbt Profile Configuration:** CrateDB targets should be set up using the
47-
following configuration in your connection profile, e.g. within a
48-
[`profiles.yml`] file at `~/.dbt/profiles.yml`.
24+
Install the most recent version of the [dbt-cratedb2] Python package.
25+
```shell
26+
pip install --upgrade 'dbt-cratedb2'
27+
```
28+
:::{note}
29+
dbt-cratedb2 is based on dbt-postgres, which uses [psycopg2] to connect to
30+
the database server.
31+
:::
4932

50-
Now, create a connection profile `profiles.yaml` file including your
51-
connection details, for example at `~/.dbt/profiles.yml`.
33+
## Configure
34+
A minimal set of **dbt profile configuration** options, for example within a
35+
[`profiles.yml`] file at `~/.dbt/profiles.yml`.
5236
```bash
5337
cd ~
5438
mkdir -p .dbt
@@ -67,66 +51,23 @@ cratedb_analytics:
6751
search_path: doc
6852
EOF
6953
```
70-
(please note the values for `database`, `schema`, and `search_path` in this example)
54+
Please note the values for `dbname`, `schema`, and `search_path` in this example.
7155

72-
A dbt project has a [specific structure][dbt-project-structure], and contains a combination of SQL, Jinja, YAML, and Markdown files.
56+
## Project
57+
When working with dbt, you are working on behalf of a dbt project.
58+
A dbt project has a [specific structure][dbt-project-structure], and contains a
59+
combination of SQL, Jinja, YAML, and Markdown files.
7360
In your project folder, alongside the `models` folder that most projects have,
7461
a folder called `macros` can include macro override files.
7562

76-
77-
Those dbt features have been tested successfully:
78-
79-
* models with [view, table, and ephemeral materializations](https://docs.getdbt.com/docs/build/materializations)
80-
* [dbt source freshness](https://docs.getdbt.com/docs/deploy/source-freshness)
81-
* [dbt test](https://docs.getdbt.com/docs/build/tests)
82-
* [dbt seed](https://docs.getdbt.com/docs/build/seeds)
83-
* [Incremental materializations](https://docs.getdbt.com/docs/build/incremental-models) (with `incremental_strategy='delete+insert'` and without involving [OBJECT](https://crate.io/docs/crate/reference/en/5.4/general/ddl/data-types.html#objects) columns)
84-
85-
We hope you find this useful. CrateDB is continuously adding new features and we will endeavor to come back and update this article if there are any developments and some of these overrides require changes or become obsolete.
86-
63+
At [cratedb-examples » framework/dbt], you can explore a few ready-to-run dbt
64+
projects that demonstrate usage with CrateDB.
8765

8866
## Appendix
8967

9068
A few notes about advanced configuration options and general usage
9169
information.
9270

93-
### CrateDB's Differences
94-
- CrateDB’s fixed catalog name is `crate`, the default schema name is `doc`.
95-
- CrateDB does not implement the notion of a database, however tables can be created in different [schemas](https://cratedb.com/docs/crate/reference/en/latest/general/ddl/create-table.html#ddl-create-table-schemas).
96-
- When asked for a database name, specifying a schema name (any), or the fixed catalog name `crate` may be applicable.
97-
- If a database-/schema-name is omitted while connecting, the PostgreSQL drivers may default to the “username”.
98-
- The predefined [superuser](https://cratedb.com/docs/crate/reference/en/latest/admin/user-management.html#administration-user-management) on an unconfigured CrateDB cluster is called `crate`, defined without a password.
99-
- For authenticating properly, please learn about the available [authentication](https://cratedb.com/docs/crate/reference/en/latest/admin/auth/index.html#admin-auth) options.
100-
101-
-- https://cratedb.com/docs/crate/clients-tools/en/latest/connect/#configure
102-
103-
### Connection Options
104-
**dbt Profile Configuration:** CrateDB targets should be set up using the
105-
following configuration in your [`profiles.yml`] file.
106-
```yaml
107-
company-name:
108-
target: dev
109-
outputs:
110-
dev:
111-
type: cratedb
112-
host: [clustername].aks1.westeurope.azure.cratedb.net
113-
user: [username]
114-
password: [password]
115-
port: 5432
116-
dbname: crate # CrateDB's only catalog is `crate`.
117-
schema: doc # You can define any schema. `doc` is the default.
118-
threads: [optional, 1 or more]
119-
[keepalives_idle](#keepalives_idle): 0 # default 0, indicating the system default. See below
120-
connect_timeout: 10 # default 10 seconds
121-
[retries](#retries): 1 # default 1 retry on error/timeout when opening connections
122-
[search_path](#search_path): [optional, override the default postgres search_path]
123-
[role](#role): [optional, set the role dbt assumes when executing queries]
124-
[sslmode](#sslmode): [optional, set the sslmode used to connect to the database]
125-
[sslcert](#sslcert): [optional, set the sslcert to control the certifcate file location]
126-
[sslkey](#sslkey): [optional, set the sslkey to control the location of the private key]
127-
[sslrootcert](#sslrootcert): [optional, set the sslrootcert config value to a new file path in order to customize the file location that contain root certificates]
128-
```
129-
13071
### Search Path
13172
The `search_path` config controls the CrateDB "search path" that dbt configures
13273
when opening new connections to the database. By default, the CrateDB search
@@ -154,7 +95,78 @@ the name generation according to your needs.
15495
{%- endmacro %}
15596
```
15697

98+
### Full Connection Options
99+
CrateDB targets should be set up using the following **dbt profile configuration** in
100+
your [`profiles.yml`] file, which is identical to the [setup options of dbt-postgres].
101+
```yaml
102+
cratedb_analytics:
103+
target: dev
104+
outputs:
105+
dev:
106+
type: cratedb
107+
host: [clustername].aks1.westeurope.azure.cratedb.net
108+
user: [username]
109+
password: [password]
110+
port: 5432
111+
dbname: crate # CrateDB's only catalog is `crate`.
112+
schema: doc # You can define any schema. `doc` is the default.
113+
threads: [optional, 1 or more]
114+
[keepalives_idle]: 0 # default 0, indicating the system default.
115+
connect_timeout: 10 # default 10 seconds
116+
[retries]: 1 # default 1 retry on error/timeout when opening connections
117+
[search_path]: # optional, override the default postgres `search_path`
118+
[role]: # optional, set the role dbt assumes when executing queries
119+
[sslmode]: # optional, set the `sslmode` used to connect to the database
120+
[sslcert]: # optional, set the `sslcert` to control the certificate file location
121+
[sslkey]: # optional, set the `sslkey` to control the location of the private key
122+
[sslrootcert]: # optional, set the `sslrootcert` config value to a new file path
123+
# in order to customize the file location that contain root certificates
124+
```
125+
126+
127+
## Notes
128+
129+
### CrateDB's Differences
130+
- CrateDB’s fixed catalog name is `crate`, the default schema name is `doc`.
131+
- CrateDB does not implement the notion of a database, however tables can be created in different [schemas](https://cratedb.com/docs/crate/reference/en/latest/general/ddl/create-table.html#ddl-create-table-schemas).
132+
- When asked for a database name, specifying a schema name (any), or the fixed catalog name `crate` may be applicable.
133+
- If a database-/schema-name is omitted while connecting, the PostgreSQL drivers may default to the “username”.
134+
- The predefined [superuser](https://cratedb.com/docs/crate/reference/en/latest/admin/user-management.html#administration-user-management) on an unconfigured CrateDB cluster is called `crate`, defined without a password.
135+
- For authenticating properly, please learn about the available [authentication](https://cratedb.com/docs/crate/reference/en/latest/admin/auth/index.html#admin-auth) options.
136+
137+
### Feature Coverage
138+
Those dbt features have been tested successfully with CrateDB.
139+
140+
* [Model materializations](https://docs.getdbt.com/docs/build/materializations):
141+
table, view, incremental, ephemeral
142+
* [Incremental models](https://docs.getdbt.com/docs/build/incremental-models-overview)
143+
* [Source data freshness](https://docs.getdbt.com/docs/build/sources#source-data-freshness)
144+
* [CSV seeds](https://docs.getdbt.com/docs/build/seeds)
145+
* [Data tests](https://docs.getdbt.com/docs/build/tests)
146+
147+
### Caveats
148+
- Model materializations using the "materialized view" strategy are
149+
not supported yet.
150+
- Incremental materializations with CrateDB currently only support the
151+
`delete+insert` strategy.
152+
- Incremental materializations do not support columns using the
153+
{ref}`OBJECT <crate-reference:data-types-objects>` data type yet.
154+
155+
156+
:::{note}
157+
CrateDB is continuously adding new features and we will endeavor to come
158+
back and update this article if there are any updates or improvements.
159+
We are tracking interoperability issues per [Tool: dbt], and appreciate
160+
any contributions and reports.
161+
:::
162+
157163

164+
[cratedb-examples » framework/dbt]: https://github.com/crate/cratedb-examples/tree/main/framework/dbt/
165+
[custom schemas with dbt]: https://docs.getdbt.com/docs/build/custom-schemas
158166
[dbt]: https://www.getdbt.com/
159-
[dbt-core]: https://github.com/dbt-labs/dbt-core
167+
[dbt-cratedb2]: https://pypi.org/project/dbt-cratedb2/
160168
[dbt-project-structure]: https://docs.getdbt.com/guides/best-practices/how-we-structure/1-guide-overview
169+
[`profiles.yml`]: https://docs.getdbt.com/docs/core/connect-data-platform/profiles.yml
170+
[psycopg2]: https://pypi.org/project/psycopg2/
171+
[setup options of dbt-postgres]: https://docs.getdbt.com/docs/core/connect-data-platform/postgres-setup
172+
[Tool: dbt]: https://github.com/crate/crate/labels/tool%3A%20dbt

0 commit comments

Comments
 (0)