Skip to content

Commit d73c156

Browse files
binstecpcloud
authored andcommitted
docs(blog): add dbt-ibis post
1 parent 0aaad00 commit d73c156

File tree

1 file changed

+117
-0
lines changed

1 file changed

+117
-0
lines changed

docs/posts/dbt-ibis/index.qmd

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
---
2+
title: "dbt-ibis: Write your dbt models using Ibis"
3+
author: "Stefan Binder"
4+
date: "2023-11-24"
5+
categories:
6+
- blog
7+
- dbt
8+
- data engineering
9+
---
10+
11+
# Introduction to dbt
12+
[dbt](https://github.com/dbt-labs/dbt-core) has revolutionized how transformations are
13+
orchestrated and managed within modern data warehouses. Initially released in 2016,
14+
dbt quickly gained traction within the data analytics community due to its focus on bringing software engineering best practices to analytics code like modularity, portability, CI/CD, and documentation.
15+
16+
At the heart of dbt are so called "models" which are just simple SQL SELECT statements
17+
(see further below for an example). dbt removes the need to write any DDL/DML,
18+
allowing users to focus on writing SELECT statements. Depending on how you configure it, the queries are materialized as tables, views, or custom materializations. dbt also infers dependencies between models and runs them in order. The following is a dbt model which selects from two
19+
other models called `stg_orders` and `stg_customers`:
20+
21+
```sql
22+
WITH customer_orders as (
23+
SELECT
24+
customer_id AS customer_id,
25+
MIN(order_date) AS first_order,
26+
MAX(order_date) AS most_recent_order,
27+
COUNT(*) AS number_of_orders
28+
FROM {{ ref('stg_orders') }} AS orders
29+
GROUP BY
30+
customer_id
31+
), customer_orders_info as (
32+
SELECT
33+
customers.customer_id AS customer_id,
34+
customers.first_name AS first_name,
35+
customers.last_name AS last_name,
36+
customer_orders.customer_id AS customer_id_right,
37+
customer_orders.first_order AS first_order,
38+
customer_orders.most_recent_order AS most_recent_order,
39+
customer_orders.number_of_orders AS number_of_orders
40+
FROM {{ ref('stg_customers') }} AS customers
41+
LEFT OUTER JOIN customer_orders
42+
ON customers.customer_id = customer_orders.customer_id
43+
)
44+
SELECT
45+
customer_id,
46+
first_name,
47+
last_name,
48+
first_order,
49+
most_recent_order,
50+
number_of_orders
51+
FROM customer_orders_info
52+
```
53+
dbt will make sure that the resulting table will be created after `stg_orders`
54+
and `stg_customers`. This model is inspired by the [jaffle shop demo project by dbt Labs](https://github.com/dbt-labs/jaffle_shop)
55+
where you can find more example queries.
56+
57+
At the end of 2022, dbt added support for [Python models](https://docs.getdbt.com/docs/build/python-models)
58+
on specific platforms (Snowflake, Databricks, Google Cloud Platform). This can be useful
59+
for complex transformations such as using a machine learning model and storing the results.
60+
However, it also requires that your Python code is run in a cloud data warehouse and often,
61+
that data is moved into a Python process which can be slower than leveraging the power of modern SQL engines.
62+
63+
64+
# Why dbt and Ibis go great together
65+
[dbt-ibis](https://github.com/binste/dbt-ibis) offers a lightweight and compatible alternative,
66+
which allows you to write dbt models using Ibis. dbt-ibis transparently converts your Ibis
67+
statements into SQL and then hands it over to dbt. Your database does not need to have Python
68+
support for this as everything is executed in the same process as dbt. Hence, this allows for
69+
working in Python for all dbt adapters with supported Ibis backends. Rewriting the above SQL model in Ibis we get:
70+
71+
```python
72+
from dbt_ibis import depends_on, ref
73+
74+
75+
@depends_on(ref("stg_customers"), ref("stg_orders"))
76+
def model(customers, orders):
77+
customer_orders = orders.group_by("customer_id").aggregate(
78+
first_order=orders["order_date"].min(),
79+
most_recent_order=orders["order_date"].max(),
80+
number_of_orders=orders.count(),
81+
)
82+
# Add first_name and last_name
83+
customer_orders = customers.join(customer_orders, "customer_id", how="left")
84+
return customer_orders.select(
85+
"customer_id",
86+
"first_name",
87+
"last_name",
88+
"first_order",
89+
"most_recent_order",
90+
"number_of_orders",
91+
)
92+
```
93+
94+
Using Ibis instead of SQL for dbt models brings you many advantages:
95+
96+
* Type checks and validation before your code is executed in a database.
97+
* More composable as you can break down complex queries into smaller pieces.
98+
* Better reusability of code. Although dbt allows you to use [Jinja and macros](https://docs.getdbt.com/docs/build/jinja-macros), which is an improvement over plain SQL, this gets you only so far. String manipulation is inherently fragile. With dbt-ibis, you can easily share common code between models.
99+
* Your dbt models become backend agnostic which reduces lock-in to a specific database. Furthermore, you get the possibility of building a [multi-engine data stack](https://juhache.substack.com/p/n-engines-1-language?publication_id=1211981&post_id=137718100). For example, you could use DuckDB for small to medium workloads and Snowflake for heavy workloads and as an end-user and BI layer leveraging its governance features. Depending on the size of your warehouse, this can result in significant cost savings.
100+
* Unit test your code with your favorite Python testing frameworks such as pytest.
101+
102+
In addition, you can stick to the tool (Ibis) you like, no matter if you're writing an
103+
ingestion pipeline, a dbt model to transform the data in your data warehouse, or conduct an ad-hoc analysis in a Jupyter notebook.
104+
105+
Be aware that a current limitation of dbt-ibis is that you cannot connect to the database
106+
from within your dbt models, i.e. you purely use Ibis to construct a SELECT statement. You cannot execute statements and act based on the results.
107+
108+
# Further readings
109+
If you want to give dbt-ibis a try, head over to the [GitHub repo](https://github.com/binste/dbt-ibis/blob/main/README.md)
110+
for more information on how to get up and running in no time!
111+
112+
For more details on the future of the integration of Ibis within dbt, you can check out
113+
[this PR](https://github.com/dbt-labs/dbt-core/pull/5274#issuecomment-1132772028) and [this GitHub issue](https://github.com/dbt-labs/dbt-core/issues/6184)
114+
on adding an official plugin system to dbt
115+
which could be used to provide first-class support for modeling languages in general and
116+
which might allow dbt-ibis to provide an even better user experience and more features.
117+
See also this [discussion on Ibis as a dataframe API in the dbt GitHub repo](https://github.com/dbt-labs/dbt-core/discussions/5738).

0 commit comments

Comments
 (0)