Description
Is this a new bug in dbt-core?
- I believe this is a new bug in dbt-core
- I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
When i add new model to my dbt project and call dbt parse
everything works fine and the model appears in manifest.json file. Then I make any change to the yaml file where a model is described (for example add new column) and run dbt parse
again, I get DuplicatePatchPathError:
Compilation Error
dbt found two schema.yml entries for the same resource named test_catalog.test_schema.test_model. Resources and their associated columns may only be described a single time. To fix this, remove one of the resource entries for test_catalog.test_schema.test_model in this file:
- models\marts\dbt_test_marts\test_catalog.test_schema.test_model.yml
When I change any other files related to this model and keep yaml intouched, partial parsing works fine.
I know that this issue can be avoided by use full parsing instead of partial but dbt projects may be very big and parsing all the project each time we change a single model may be time consuming and costly.
Expected Behavior
Dbt allows to do partial parsing when schema is changed.
Steps To Reproduce
Add the following test_catalog.test_schema.test_model.yml
and test_catalog.test_schema.test_model.sql
files to your project:
version: 2
models:
- name: test_catalog.test_schema.test_model
config:
alias: test_model
schema: test_schema
materialized: table
columns:
- name: col1
SELECT 1 as col1
Then parse the project with this command dbt parse --profiles-dir . --project-dir ./dbt --log-path ./dbt/logs --target-path ./target --debug
Then change test_catalog.test_schema.test_model.yml
file by adding new column:
version: 2
models:
- name: test_catalog.test_schema.test_model
config:
alias: test_model
schema: test_schema
materialized: table
columns:
- name: col1
- name: col2
Then run again dbt parse --profiles-dir . --project-dir ./dbt --log-path ./dbt/logs --target-path ./target --debug
Relevant log output
> dbt parse --profiles-dir ./dags --project-dir ./dags/dbt --log-path ./dags/dbt/logs --target-path ./target --debug
15:39:43 Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E91A223E0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E93B239A0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E93B236A0>]}
15:39:43 Running with dbt=1.9.2
15:39:43 running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'log_cache_events': 'False', 'write_json': 'True', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'version_check': 'True', 'debug': 'True', 'log_path': 'C:\\Users\\60098727\\dp--batch-proc-dlh-dbt-loader\\dags\\dbt\\logs', 'profiles_dir': './dags', 'fail_fast': 'False', 'use_colors': 'True', 'use_experimental_parser': 'False', 'empty': 'None', 'quiet': 'False', 'no_print': 'None', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'invocation_command': 'dbt parse --profiles-dir ./dags --project-dir ./dags/dbt --log-path ./dags/dbt/logs --target-path ./target --debug', 'introspect': 'True', 'log_format': 'default', 'target_path': './target', 'static_parser': 'True', 'send_anonymous_usage_stats': 'True'}
15:39:43 Sending event: {'category': 'dbt', 'action': 'project_id', 'label': 'f5857103-b702-4f4f-82e5-7333ff40212b', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E934612D0>]}
15:39:43 Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': 'f5857103-b702-4f4f-82e5-7333ff40212b', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E93C24AC0>]}
15:39:43 Registered adapter: trino=1.8.1
15:39:43 checksum: 12b12750b70de726cfd89136b8e24afc3f3e77597a97bff40ab7e5f9b39d5e18, vars: {}, profile: , target: , version: 1.9.2
15:39:44 Partial parsing enabled: 0 files deleted, 0 files added, 1 files changed.
15:39:44 Partial parsing: updated file: dlh://models\marts\dbt_test_marts\test_catalog.test_schema.test_model.yml
ParsedNodePatch(original_file_path='models\\marts\\dbt_test_marts\\test_catalog.test_schema.test_model.yml', yaml_key='models', package_name='dlh', name='test_catalog.test_schema.test_model', description='', meta={}, docs=Docs(show=True, node_color=None), config={'alias': 'test_model', 'schema': 'test_schema', 'materialized': 'table'}, columns={'col1': ColumnInfo(name='col1', description='', meta={}, data_type=None, constraints=[], quote=None, tags=[], _extra={}, granularity=None), 'col2': ColumnInfo(name='col2', description='', meta={}, data_type=None, constraints=[], quote=None, tags=[], _extra={}, granularity=None)}, access=None, version=None, latest_version=None, constraints=[], deprecation_date=None, time_spine=None)
15:39:44 Encountered an error:
Compilation Error
dbt found two schema.yml entries for the same resource named test_catalog.test_schema.test_model. Resources and their associated columns may only be described a single time. To fix this, remove one of the resource entries for test_catalog.test_schema.test_model in this file:
- models\marts\dbt_test_marts\test_catalog.test_schema.test_model.yml
15:39:44 Command `dbt parse` failed at 18:39:44.217854 after 1.15 seconds
15:39:44 Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E91A223E0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E9440AEF0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E9440AF80>]}
15:39:44 Flushing usage events
15:39:44 An error was encountered while trying to flush usage events
Environment
- OS: Debian GNU/Linux 12 (bookworm)
- Python: 3.10.15
- dbt: 1.9.2
Which database adapter are you using with dbt?
other (mention it in "Additional Context")
Additional Context
I did some research and found that if we have patch_path
property in a model manifest, we get the DuplicatePatchPathError
dbt-core/core/dbt/parser/schemas.py
Line 848 in 77d8e32
But when we add a new model, the parser sets this property from file id which does not allow us patching the manifestin the future:
dbt-core/core/dbt/parser/schemas.py
Line 865 in 77d8e32
Maybe we can somehow pass a flag of partial parsing to
NodePatchParser
to conditionally avoid patch_path check and let partial parsing happen? If I remove check of patch_path
from lines 848-850 of core/dbt/parser/schemas.py, partial parsing works fine for me.
I use dbt-trino adapter but I believe the problem comes from dbt-core.