Skip to content

Concept Refs #11658

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Concept Refs #11658

wants to merge 3 commits into from

Conversation

jaysobel
Copy link

@jaysobel jaysobel commented May 22, 2025

These changes introduce a new abstraction called a cref that is an expansion on the idea of a ref geared toward organizing business logic rather than materializing a database object.

This feature was built with Claude Code (Claude 4.0).

Problem

Consider these two mini dbt models in an imaginary DAG - which is a better implementation?

(1)

select 
   created_at::date as date, 
  avg(customer_review) as avg_review
from {{ ref('fct_orders') }}
where status = 'completed'
group by 1

(2)

select 
   o.created_at::date as date, 
  avg(olcr.customer_review) as avg_review
from {{ ref('stg_orders') }} as o
join {{ ref('int_orders_latest_customer_reviews') }} as olcr
  on o.order_Id = olcr.order_id
where o.status = 'completed'
group by 1

Someone unfamiliar with dbt might think (1) is better because it's more DRY. But an experienced Analytics Engineer would know (2) is better because it avoids creating a dependency on fct_orders, which is probably a very deep node (= a heavy dependency) meant for external use, not as an input to further internal constructions. (2) will parallelizes better.

The problem is that building like (2) is harder than building like (1). To build like (2), the developer needs to look through the existing DAG and find the shallowest available references to the features they need (o.created_at, o.status, customer_review), and then re-construct a join that probably exists in fct_orders already.

It would be great if developer's could simply name an entity grain and its features, and have dbt generate the minimal joins (and cycle detections) automatically.

Solution

This PR introduces the Conceptual Ref - cref() and Concept definition (the word Entity was already taken!).

A 'concept' is defined with YAML, similar to a LookML Explore or semantic model. Except the joins must be either M:1 or 1:1 relative to the base table. A concept represents the potential feature joins to a given grain.

It allows models to look more like query (1), while maintaining the advantages of (2).

select 
  created_at::date as date, 
  avg(customer_review) as avg_review
from {{ cref('orders', ['created_at', 'status', 'customer_review']) }}
where status = 'completed'
group by 1

It parses/compiles to the minimal reference, as a subquery, similar to the new microbatch date filters.

select 
  created_at::date as date, 
  avg(customer_review) as avg_review
from (
  select o.order_id, o.created_at, o.status, oclr.customer_review
  from {{ ref('stg_orders') }} as o
  join {{ ref('int_orders_latest_customer_reviews') }} as olcr
    on o.order_Id = olcr.order_id
) as o
where status = 'completed'
group by 1

The underlying YAML would look like this:

concepts:
  - name: orders
    description: ""
    base_model: stg_orders
    primary_key: order_id
    columns:
      - name: order_id
      - name: created_at
      - name: status
      ..
    joins:
      - name: int_orders_latest_customer_reviews
        base_key: order_id
        foreign_key: order_id
        alias: olcr
        columns:
          - name: customer_review
"""

Checklist

  • I have read the contributing guide and understand what's expected of me.
  • I have run this code in development, and it appears to resolve the stated issue.
  • This PR includes tests, or tests are not required or relevant for this PR.
  • This PR has no interface changes (e.g., macros, CLI, logs, JSON artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX.
  • This PR includes type annotations for new and modified functions.

@jaysobel jaysobel requested a review from a team as a code owner May 22, 2025 20:20
Copy link

cla-bot bot commented May 22, 2025

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Jay Sobel.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link
Contributor

Additional Artifact Review Required

Changes to artifact directory files requires at least 2 approvals from core team members.

Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@github-actions github-actions bot added the community This PR is from a community member label May 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community This PR is from a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant