-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate DBT to Augur for Analytical Data Transformation #3048
base: dev
Are you sure you want to change the base?
Conversation
Signed-off-by: Allen Hu <[email protected]>
Signed-off-by: Allen Hu <[email protected]>
Signed-off-by: Allen Hu <[email protected]>
Signed-off-by: Allen Hu <[email protected]>
Signed-off-by: Allen Hu <[email protected]>
Signed-off-by: Allen Hu <[email protected]>
Signed-off-by: Allen Hu <[email protected]>
@AllenHsm : What is the benefit to Augur users? |
Thanks for the question! I think DBT has the potential to make it easier for user to work with Augur data, because dbt enables users to build clean and reusable views, such that we do not have to write complex SQL every time. It also adds testing and is easier to scale analytics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR integrates DBT into Augur to enable analytical transformations on the database by adding a basic “hello world” DBT project.
- Added configuration files (dbt_project.yml, profiles.yml) to set up DBT with Augur’s database.
- Introduced a model (repo_activity) with accompanying tests and integrated DBT commands into the Augur CLI.
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
dbt_project.yml | Defines the DBT project configuration for the Augur project |
augur/application/dbt/tests/repo_activity.yml | Adds a test file for the repo_activity model |
augur/application/dbt/profiles.yml | Configures the database connection parameters using environment variables |
augur/application/cli/dbt.py | Integrates DBT commands into the Augur CLI |
Files not reviewed (1)
- augur/application/dbt/models/repo_activity.sql: Language not supported
@cli.command("run") | ||
def run_dbt(): | ||
"""Run DBT models.""" | ||
if run_dbt_command(["run", "--profiles-dir", "/augur/application/dbt"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DBT commands in the CLI functions include a manually specified '--profiles-dir' parameter while run_dbt_command already appends this flag using the computed profiles path. Consider removing the hardcoded '--profiles-dir' arguments from the CLI command invocations to avoid duplication and potential conflicts.
if run_dbt_command(["run", "--profiles-dir", "/augur/application/dbt"]): | |
if run_dbt_command(["run"]): |
Copilot is powered by AI, so mistakes are possible. Review output carefully before use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot
The DBT commands in the CLI functions include a manually specified '--profiles-dir' parameter while run_dbt_command already appends this flag using the computed profiles path. Consider removing the hardcoded '--profiles-dir' arguments from the CLI command invocations to avoid duplication and potential conflicts.
I do not believe we can trust dbt to accurately compute the profile path to where we want it to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot
The DBT commands in the CLI functions include a manually specified '--profiles-dir' parameter while run_dbt_command already appends this flag using the computed profiles path. Consider removing the hardcoded '--profiles-dir' arguments from the CLI command invocations to avoid duplication and potential conflicts.
I do not believe we can trust dbt to accurately compute the profile path to where we want it to be.
I just read it through and found that I have already resolved the profile path at line 26:
result = subprocess.run([dbt_executable] + command + ["--profiles-dir", dbt_profiles_path], check=True)
. So at line 35 when it calls run_dbt_command, it is duplicate to add the path again.
I think it would be better to delete + ["--profiles-dir", dbt_profiles_path]
in line 26, because maybe line 35's call to run_dbt_command is more straightforward to understand and easier to modify.
@AllenHsm : thank you for addressing this issue! I am curious if there are some instructions for manifesting the "hello world" you could provide? (i.e., what are the steps for use? ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide some instructions on how to use this?
@@ -0,0 +1,25 @@ | |||
-- SPDX-License-Identifier: MIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AllenHsm : So dbt would call this function in the hello world? Or would the materialized view be created automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgoggins Yes, when the user calls "augur dbt run", dbt will first look at dbt_project.yml
and scan the path at line 15: model-paths: ["augur/application/dbt/models"]
. After that, it executes all the sql files in the models
folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR integrates DBT into Augur to enable analytical data transformations by adding a basic "Hello World" DBT project and CLI commands for running, debugging, testing, and compiling DBT models.
- Added dbt_project.yml with project configuration and model settings.
- Added repo_activity.sql test configuration in repo_activity.yml along with a corresponding profiles.yml for database connection.
- Integrated DBT commands into Augur CLI through a new dbt.py file.
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
dbt_project.yml | Adds the DBT project configuration and model settings. |
augur/application/dbt/tests/repo_activity.yml | Provides a test configuration for the repo_activity model. |
augur/application/dbt/profiles.yml | Configures database connection settings for DBT in Augur. |
augur/application/cli/dbt.py | Introduces CLI commands to run, debug, test, and compile DBT models. |
Files not reviewed (1)
- augur/application/dbt/models/repo_activity.sql: Language not supported
@cli.command("run") | ||
def run_dbt(): | ||
"""Run DBT models.""" | ||
if run_dbt_command(["run", "--profiles-dir", "/augur/application/dbt"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CLI command 'run_dbt' explicitly passes a '--profiles-dir' argument even though run_dbt_command automatically appends it using a computed path. This results in the argument being specified twice, which may cause conflicts. Consider removing the redundant '--profiles-dir' parameter from the command list in this function (and similarly in the other CLI commands).
if run_dbt_command(["run", "--profiles-dir", "/augur/application/dbt"]): | |
if run_dbt_command(["run"]): |
Copilot is powered by AI, so mistakes are possible. Review output carefully before use.
Of course! Basically, users only need two commands: augur dbt run and augur dbt test augur dbt runWhen users call Based on the sql codes, dbt will generate a augur dbt testFor the test command, the yml files under the Users can learn more about creating their own models, corresponding tests, and even other features of dbt here: https://docs.getdbt.com/docs/build/sql-models Since this pr is more like giving dbt a shot, I did not add other dbt features like macros. Once they work well with DBT, they would also work well after integrating DBT to Augur. Thank you @sgoggins for reviewing my PR. I do not know if my explanation meets your expectation, so pls let me know if some part is still too vague. |
Description
This PR integrates dbt (Data Build Tool) into Augur to enable analytical transformations on the database. It creates a tiny "hello world" dbt project under Augur.
profiles.yml
to configure Augur’s Postgres database for dbt. It takes in the env parameters set by Augur.repo_activity.sql
aggregates commit and issue counts per repository. A simple test under the same is also added.dbt_project.yml
which guides dbt execution and configures it.This PR fixes #2295
Notes for Reviewers
DBT requires a separate Python virtual environment to prevent dependency conflicts. After setting up the new virtual environment, you can run dbt in the original augur virtual environment.
Create a new virtual environment
python3 -m venv ~/.virtualenvs/dbt_venv
Activate the virtual environment
dbt_venv\Scripts\activate # Windows
Install dbt and the Postgres adapter
Before running DBT, configure the profiles.yml to make sure it matches your PostgreSQL setup. It is located at
augur/augur/application/dbt/profiles.yml
. By default, it takes environment parameters such asAUGUR_DB_PASSWORD
. Modify the file if needed to include your database credentials.Once the environment and profiles are set up, run DBT within Augur:
augur dbt test
Expected outputs:
augur dbt run
should generate a tablerepo_activity
, showing commit and issue counts per repository.public.repo_activity
table inside the Augur database.Signed commits
P.S. The commits are shown as "Unverified" because previously I set my
.edu
email as my local git user email. But I set up GPG using my personal email. I have corrected the local git user email.