Skip to content

[Bug] partial parse fails when schema file is updated #11363

Open
@yakovlevvs

Description

@yakovlevvs

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When i add new model to my dbt project and call dbt parse everything works fine and the model appears in manifest.json file. Then I make any change to the yaml file where a model is described (for example add new column) and run dbt parse again, I get DuplicatePatchPathError:

Compilation Error
  dbt found two schema.yml entries for the same resource named test_catalog.test_schema.test_model. Resources and their associated columns may only be described a single time. To fix this, remove one of the resource entries for test_catalog.test_schema.test_model in this file:
   - models\marts\dbt_test_marts\test_catalog.test_schema.test_model.yml

When I change any other files related to this model and keep yaml intouched, partial parsing works fine.
I know that this issue can be avoided by use full parsing instead of partial but dbt projects may be very big and parsing all the project each time we change a single model may be time consuming and costly.

Expected Behavior

Dbt allows to do partial parsing when schema is changed.

Steps To Reproduce

Add the following test_catalog.test_schema.test_model.yml and test_catalog.test_schema.test_model.sql files to your project:

version: 2

models:
  - name: test_catalog.test_schema.test_model
    config:
      alias: test_model
      schema: test_schema
      materialized: table
    columns:
      - name: col1

SELECT 1 as col1

Then parse the project with this command dbt parse --profiles-dir . --project-dir ./dbt --log-path ./dbt/logs --target-path ./target --debug

Then change test_catalog.test_schema.test_model.yml file by adding new column:

version: 2

models:
  - name: test_catalog.test_schema.test_model
    config:
      alias: test_model
      schema: test_schema
      materialized: table
    columns:
      - name: col1
      - name: col2

Then run again dbt parse --profiles-dir . --project-dir ./dbt --log-path ./dbt/logs --target-path ./target --debug

Relevant log output

> dbt parse --profiles-dir ./dags --project-dir ./dags/dbt --log-path ./dags/dbt/logs --target-path ./target --debug
15:39:43  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E91A223E0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E93B239A0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E93B236A0>]}
15:39:43  Running with dbt=1.9.2
15:39:43  running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'log_cache_events': 'False', 'write_json': 'True', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'version_check': 'True', 'debug': 'True', 'log_path': 'C:\\Users\\60098727\\dp--batch-proc-dlh-dbt-loader\\dags\\dbt\\logs', 'profiles_dir': './dags', 'fail_fast': 'False', 'use_colors': 'True', 'use_experimental_parser': 'False', 'empty': 'None', 'quiet': 'False', 'no_print': 'None', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'invocation_command': 'dbt parse --profiles-dir ./dags --project-dir ./dags/dbt --log-path ./dags/dbt/logs --target-path ./target --debug', 'introspect': 'True', 'log_format': 'default', 'target_path': './target', 'static_parser': 'True', 'send_anonymous_usage_stats': 'True'}  
15:39:43  Sending event: {'category': 'dbt', 'action': 'project_id', 'label': 'f5857103-b702-4f4f-82e5-7333ff40212b', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E934612D0>]}
15:39:43  Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': 'f5857103-b702-4f4f-82e5-7333ff40212b', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E93C24AC0>]}
15:39:43  Registered adapter: trino=1.8.1
15:39:43  checksum: 12b12750b70de726cfd89136b8e24afc3f3e77597a97bff40ab7e5f9b39d5e18, vars: {}, profile: , target: , version: 1.9.2
15:39:44  Partial parsing enabled: 0 files deleted, 0 files added, 1 files changed.
15:39:44  Partial parsing: updated file: dlh://models\marts\dbt_test_marts\test_catalog.test_schema.test_model.yml
ParsedNodePatch(original_file_path='models\\marts\\dbt_test_marts\\test_catalog.test_schema.test_model.yml', yaml_key='models', package_name='dlh', name='test_catalog.test_schema.test_model', description='', meta={}, docs=Docs(show=True, node_color=None), config={'alias': 'test_model', 'schema': 'test_schema', 'materialized': 'table'}, columns={'col1': ColumnInfo(name='col1', description='', meta={}, data_type=None, constraints=[], quote=None, tags=[], _extra={}, granularity=None), 'col2': ColumnInfo(name='col2', description='', meta={}, data_type=None, constraints=[], quote=None, tags=[], _extra={}, granularity=None)}, access=None, version=None, latest_version=None, constraints=[], deprecation_date=None, time_spine=None)

15:39:44  Encountered an error:
Compilation Error
  dbt found two schema.yml entries for the same resource named test_catalog.test_schema.test_model. Resources and their associated columns may only be described a single time. To fix this, remove one of the resource entries for test_catalog.test_schema.test_model in this file:
   - models\marts\dbt_test_marts\test_catalog.test_schema.test_model.yml

15:39:44  Command `dbt parse` failed at 18:39:44.217854 after 1.15 seconds
15:39:44  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E91A223E0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E9440AEF0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E9440AF80>]}
15:39:44  Flushing usage events
15:39:44  An error was encountered while trying to flush usage events

Environment

- OS: Debian GNU/Linux 12 (bookworm)
- Python: 3.10.15
- dbt: 1.9.2

Which database adapter are you using with dbt?

other (mention it in "Additional Context")

Additional Context

I did some research and found that if we have patch_path property in a model manifest, we get the DuplicatePatchPathError

if node.patch_path:

But when we add a new model, the parser sets this property from file id which does not allow us patching the manifestin the future:
node.patch_path = patch.file_id

Maybe we can somehow pass a flag of partial parsing to NodePatchParser to conditionally avoid patch_path check and let partial parsing happen? If I remove check of patch_path from lines 848-850 of core/dbt/parser/schemas.py, partial parsing works fine for me.

I use dbt-trino adapter but I believe the problem comes from dbt-core.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions