Skip to content

ENG-663: add StagedResourceAncestors table and remove child_diff_statuses #6185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jun 6, 2025

Conversation

adamsachs
Copy link
Contributor

@adamsachs adamsachs commented May 30, 2025

Supports https://github.com/ethyca/fidesplus/pull/2201

Description Of Changes

DB model updates to support computing child_diff_statuses at query time, not tracking it in DB.

Also moves our post_upgrade_index_creation utility to be kicked off as part of fides server startup, rather than as a standalone script executed outside of app runtime. This is so that we don't have to manually execute the script from a location with app DB access, against all tenants/clients that need the indexes created; instead, it happens automatically when the server boots up.

⚠️ Data migration

This PR contains a data migration which can take a long time (>5 minutes) to run on environments with many (>500k) stagedresource records. App health checks should be adjusted in environments where this is anticipated, in order to avoid crash loops on upgrade.

The data migration can also be memory intensive in these cases (~2GB utilized locally on a case with 1.5 million stagedresource records), so memory allocation should be adjusted accordingly for the duration of the migration.

⚠️

Code Changes

  • Moves our post_upgrade_index_creation utility to be executed as a non-blocking operation kicked off on on server startup
  • Removes child_diff_statuses from StagedResource DB model
  • Adds an index to diff_status in StagedResource table
  • Creates a StagedResourceAncestor junction table to be used for efficiently calculating child_diff_statuses at query time
    • utility classmethod on this model class to help with efficient bulk-creation of ancestor records
  • Migrations to (1) create stagedresourceancestor table (2) populate that table with data, based on existing stagedresource records and their hierarchical relationships, then add indexes and constraints to that table (3) remove the stagedresource.child_diff_statuses column (this is done after (2) so that we don't remove this column until (2) completes successfully)
    • NOTE: the data migration (2) will not create indexes and constraints as part of the migration if >1 million stagedresourceancestor records are created as part of the migration, to avoid long-running operations during the migration execution, which will prevent the server from starting up quickly. Instead, an INFO message is logged, and the
      "post upgrade creation" utility handles creation of the index.

Steps to Confirm

  • test migration with high volumes of data
  • verify post upgrade index creation:

initial migration:

2025-06-06 13:06:28.202 | INFO     |  - Populating staged resource ancestor links...
2025-06-06 13:06:46.789 | INFO     |  - Processing batch of 500000 resources
2025-06-06 13:07:00.340 | INFO     |  - Processing batch of 500000 resources
2025-06-06 13:07:14.910 | INFO     |  - Processing batch of 500000 resources
2025-06-06 13:07:15.813 | INFO     |  - Processing batch of 500000 resources
2025-06-06 13:07:15.833 | INFO     |  - Recursively processing 203100 resources for ancestor links in current batch
2025-06-06 13:07:47.063 | INFO     |  - Found 7379882 ancestor links in current batch
2025-06-06 13:07:47.067 | INFO     |  - Writing 7379882 ancestor links to CSV file
2025-06-06 13:08:19.897 | INFO     |  - Copying 7379882 ancestor links from CSV file into stagedresourceancestor table...
2025-06-06 13:10:05.218 | INFO     |  - Completed copying all ancestor links. Total ancestor links created: 7379882

post upgrade index creation:

fidesplus-slim  | 2025-06-05 20:24:42.522 | DEBUG    | fides.api.migrations.post_upgrade_index_creation:check_and_create_objects:198 - Constraint last_saved_for_email_per_property_id already exists, skipping index creation for ix_currentprivacypreferencev2_email_property_id
fidesplus-slim  | 2025-06-05 20:24:42.524 | DEBUG    | fides.api.migrations.post_upgrade_index_creation:check_and_create_objects:198 - Constraint last_saved_for_external_id_per_property_id already exists, skipping index creation for ix_currentprivacypreferencev2_external_id_property_id
fidesplus-slim  | 2025-06-05 20:24:42.528 | DEBUG    | fides.api.migrations.post_upgrade_index_creation:check_and_create_objects:198 - Constraint last_saved_for_fides_user_device_per_property_id already exists, skipping index creation for ix_currentprivacypreferencev2_fides_user_device_property_id
fidesplus-slim  | 2025-06-05 20:24:42.529 | DEBUG    | fides.api.migrations.post_upgrade_index_creation:check_and_create_objects:198 - Constraint last_saved_for_phone_number_per_property_id already exists, skipping index creation for ix_currentprivacypreferencev2_phone_number_property_id
fidesplus-slim  | 2025-06-05 20:24:42.531 | INFO     | fides.api.migrations.hash_migration_job:bcrypt_migration_task:57 - PrivacyPreferenceHistory is already fully migrated.
fidesplus-slim  | 2025-06-05 20:24:42.531 | DEBUG    | fides.api.migrations.post_upgrade_index_creation:check_and_create_objects:206 - Object ix_currentprivacypreferencev2_hashed_external_id already exists, skipping index/constraint creation
fidesplus-slim  | 2025-06-05 20:24:42.535 | INFO     | fides.api.migrations.hash_migration_job:bcrypt_migration_task:57 - ServedNoticeHistory is already fully migrated.
fidesplus-slim  | 2025-06-05 20:24:42.536 | DEBUG    | fides.api.migrations.post_upgrade_index_creation:check_and_create_objects:206 - Object last_saved_for_email_per_property_id already exists, skipping index/constraint creation
fidesplus-slim  | 2025-06-05 20:24:42.538 | DEBUG    | fides.api.migrations.post_upgrade_index_creation:check_and_create_objects:206 - Object last_saved_for_external_id_per_property_id already exists, skipping index/constraint creation
fidesplus-slim  | 2025-06-05 20:24:42.539 | DEBUG    | fides.api.migrations.post_upgrade_index_creation:check_and_create_objects:206 - Object last_saved_for_fides_user_device_per_property_id already exists, skipping index/constraint creation
fidesplus-slim  | 2025-06-05 20:24:42.540 | DEBUG    | fides.api.migrations.post_upgrade_index_creation:check_and_create_objects:206 - Object last_saved_for_phone_number_per_property_id already exists, skipping index/constraint creation
fidesplus-slim  | 2025-06-05 20:24:42.542 | DEBUG    | fides.api.migrations.post_upgrade_index_creation:check_and_create_objects:206 - Object ix_privacypreferencehistory_hashed_external_id already exists, skipping index/constraint creation
fidesplus-slim  | 2025-06-05 20:24:42.544 | DEBUG    | fides.api.migrations.post_upgrade_index_creation:check_and_create_objects:206 - Object ix_servednoticehistory_hashed_external_id already exists, skipping index/constraint creation
fidesplus-slim  | 2025-06-05 20:24:42.545 | INFO     | fidesplus.api.gvl.gvl_util:load_gvl_json:22 - Fetching GVL JSON from 'https://vendor-list.consensu.org/v3/vendor-list.json'
fidesplus-slim  | 2025-06-05 20:24:42.546 | INFO     | fides.api.migrations.post_upgrade_index_creation:create_object:172 - Creating index/constraint object by executing: 'ix_staged_resource_ancestor_pkey'...
...
fidesplus-slim  | 2025-06-05 20:24:49.439 | INFO     | fides.api.migrations.post_upgrade_index_creation:create_object:177 - Successfully created index/constraint object: 'ix_staged_resource_ancestor_pkey'
fidesplus-slim  | 2025-06-05 20:24:49.448 | INFO     | fides.api.migrations.post_upgrade_index_creation:create_object:172 - Creating index/constraint object by executing: 'ix_staged_resource_ancestor_unique'...
fidesplus-slim  | 2025-06-05 20:25:01.733 | INFO     | fides.api.migrations.post_upgrade_index_creation:create_object:177 - Successfully created index/constraint object: 'ix_staged_resource_ancestor_unique'
fidesplus-slim  | 2025-06-05 20:25:01.757 | INFO     | fides.api.migrations.post_upgrade_index_creation:create_object:172 - Creating index/constraint object by executing: 'uq_staged_resource_ancestor'...
fidesplus-slim  | 2025-06-05 20:25:01.792 | INFO     | fides.api.migrations.post_upgrade_index_creation:create_object:177 - Successfully created index/constraint object: 'uq_staged_resource_ancestor'
fidesplus-slim  | 2025-06-05 20:25:01.793 | INFO     | fides.api.migrations.post_upgrade_index_creation:create_object:172 - Creating index/constraint object by executing: 'fk_staged_resource_ancestor_ancestor'...
fidesplus-slim  | 2025-06-05 20:25:05.161 | INFO     | fides.api.main:log_request:248 - Request received | {'method': 'GET', 'status_code': 200, 'handler_time': '507.734ms', 'path': '/api/v1/plus/health', 'fides_client': 'unknown'}
fidesplus-slim  | 2025-06-05 20:25:07.463 | INFO     | fides.api.migrations.post_upgrade_index_creation:create_object:177 - Successfully created index/constraint object: 'fk_staged_resource_ancestor_ancestor'
fidesplus-slim  | 2025-06-05 20:25:07.474 | INFO     | fides.api.migrations.post_upgrade_index_creation:create_object:172 - Creating index/constraint object by executing: 'fk_staged_resource_ancestor_descendant'...
fidesplus-slim  | 2025-06-05 20:25:10.746 | INFO     | fides.api.migrations.post_upgrade_index_creation:create_object:177 - Successfully created index/constraint object: 'fk_staged_resource_ancestor_descendant'
fidesplus-slim  | 2025-06-05 20:25:10.755 | INFO     | fides.api.migrations.post_upgrade_index_creation:create_object:172 - Creating index/constraint object by executing: 'ix_staged_resource_ancestor_ancestor'...
fidesplus-slim  | 2025-06-05 20:25:12.531 | DEBUG    | fides.api.tasks:get_new_session:87 - DatabaseTaskSession ID: 281472662109536. Self ID: 281471930762288
fidesplus-slim  | 2025-06-05 20:25:12.536 | DEBUG    | fides.api.service.privacy_request.request_service:poll_for_exited_privacy_request_tasks:200 - Polling for privacy requests awaiting errored status change
fidesplus-slim  | 2025-06-05 20:25:18.103 | INFO     | fides.api.migrations.post_upgrade_index_creation:create_object:177 - Successfully created index/constraint object: 'ix_staged_resource_ancestor_ancestor'
fidesplus-slim  | 2025-06-05 20:25:18.114 | INFO     | fides.api.migrations.post_upgrade_index_creation:create_object:172 - Creating index/constraint object by executing: 'ix_staged_resource_ancestor_descendant'...
fidesplus-slim  | 2025-06-05 20:25:31.044 | INFO     | fides.api.migrations.post_upgrade_index_creation:create_object:177 - Successfully created index/constraint object: 'ix_staged_resource_ancestor_descendant'
fidesplus-slim  | 2025-06-05 20:25:31.047 | INFO     | fides.api.migrations.post_upgrade_index_creation:post_upgrade_index_creation_task:229 - Post upgrade index creation output: {"ix_staged_resource_ancestor_pkey": "in progress", "ix_staged_resource_ancestor_unique": "in progress", "uq_staged_resource_ancestor": "in progress", "fk_staged_resource_ancestor_ancestor": "in progress", "fk_staged_resource_ancestor_descendant": "in progress", "ix_staged_resource_ancestor_ancestor": "in progress", "ix_staged_resource_ancestor_descendant": "in progress"}

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

Copy link

vercel bot commented May 30, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

2 Skipped Deployments
Name Status Preview Comments Updated (UTC)
fides-plus-nightly ⬜️ Ignored (Inspect) Visit Preview Jun 6, 2025 7:05pm
fides-privacy-center ⬜️ Ignored (Inspect) Jun 6, 2025 7:05pm

Copy link

codecov bot commented May 30, 2025

Codecov Report

Attention: Patch coverage is 92.30769% with 4 lines in your changes missing coverage. Please review.

Project coverage is 87.11%. Comparing base (f0427a3) to head (16a9ce8).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...des/api/service/privacy_request/request_service.py 25.00% 3 Missing ⚠️
src/fides/api/util/lock.py 95.45% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6185      +/-   ##
==========================================
+ Coverage   87.07%   87.11%   +0.03%     
==========================================
  Files         423      424       +1     
  Lines       26321    26356      +35     
  Branches     2870     2872       +2     
==========================================
+ Hits        22920    22960      +40     
+ Misses       2776     2769       -7     
- Partials      625      627       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.


ancestor_links: RelationshipProperty[List[StagedResourceAncestor]] = relationship(
"StagedResourceAncestor",
back_populates="descendant_staged_resource",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be ancestor_staged_resource and in relation descendant_links be descendant_staged_resource? Sorry, my head is exploding a little thinking about ancestors and descendants 🤯

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, my head is exploding a bit too now...let me double check on this, thanks for calling it out. at a minimum it deserves a code comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still need to circle back to this, but since tests that evaluate the relationships are passing, i'm going to leave these as-is 👍

@adamsachs adamsachs marked this pull request as ready for review June 5, 2025 14:54
Copy link
Contributor

@vcruces vcruces left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@adamsachs adamsachs merged commit b43cc28 into main Jun 6, 2025
38 checks passed
@adamsachs adamsachs deleted the asachs/cds-refactor-fides branch June 6, 2025 19:23
Copy link

cypress bot commented Jun 6, 2025

fides    Run #12971

Run Properties:  status check passed Passed #12971  •  git commit b43cc283ab: ENG-663: add `StagedResourceAncestors` table and remove `child_diff_statuses` (#...
Project fides
Branch Review main
Run status status check passed Passed #12971
Run duration 00m 52s
Commit git commit b43cc283ab: ENG-663: add `StagedResourceAncestors` table and remove `child_diff_statuses` (#...
Committer Adam Sachs
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 5
View all changes introduced in this branch ↗︎

vcruces pushed a commit that referenced this pull request Jun 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants