Skip to content

Add sensitivityClassification #1686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

slifty
Copy link
Member

@slifty slifty commented Jun 3, 2025

This PR adds the concept of a sensitivityClassification to BaseFields. If a base field is marked as FORBIDDEN then values associated with that field are no longer exposed and new data for those fields cannot be accepted.

In an upcoming commit we will be adding mechanisms to purge data associated with forbidden fields as well so that it is not even stored.

The FORBIDDEN status results in the following exclusions (which are all added as separate commits within this PR):

  1. Associated ProposalFieldValues are not included in Changemaker.values.
  2. Associated ProposalFieldValues are not included in ProposalVersion.fieldValues.
  3. Associated ApplicationFormFields are not included in Application.fields.
  4. ApplicationFormFields cannot be created using a BaseField that is set to FORBIDDEN.
  5. ProposalFieldValues cannot be created in relation to a BaseField that is set to FORBIDDEN.

Resolves #1621

@slifty slifty force-pushed the 1621-add-sensitivity-classification branch 2 times, most recently from cdcc1e2 to 77726e4 Compare June 3, 2025 16:39
Copy link

codecov bot commented Jun 3, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.81%. Comparing base (7a67517) to head (ddc2e2b).
Report is 54 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1686      +/-   ##
==========================================
+ Coverage   92.79%   92.81%   +0.01%     
==========================================
  Files         243      243              
  Lines        2624     2630       +6     
  Branches      402      403       +1     
==========================================
+ Hits         2435     2441       +6     
  Misses        183      183              
  Partials        6        6              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@slifty slifty marked this pull request as draft June 5, 2025 20:17
@slifty slifty force-pushed the 1621-add-sensitivity-classification branch 3 times, most recently from 96c45e6 to f87d08c Compare June 9, 2025 19:49
slifty added 12 commits June 9, 2025 15:49
We were using the looser `toMatchObject` in some of our baseField tests.
This just meant if the shape of the base field expanded, our tests would
not properly capture mistakes where that expanded form was not provided
(or returned correctly).
The `valueRelevanceHours` field requires a value (even if that value is
`null`).
This category allows a PDC administrator to specify the sensitivity of
data associated with a base field.  We will be able to use this to
make some data publicly available (e.g. for changemaker names) in
addition to actually disallowing the storage of certain data in the PDC.

The mechanisms for actually exposing public data and purging forbidden
data will be implemented in future commits.

Issue #1621 Add the concept of "too sensitive to store data" to base fields
When loading a changemaker we provide field values associated with that
changemaker for any organizational field.  Now that some base fields can
be FORBIDDEN we do not want to expose that data under any circumstance,
even if it has not yet been purged.

Issue #1621 Add the concept of "too sensitive to store data" to base fields
We use tabs in our sql files; this used spaces.
If a base field is forbidden we do not want to expose field values
associated with it.  This change removes those field values when loading
proposal versions at the database level.

Issue #1621 Add the concept of "too sensitive to store data" to base fields
Forbidden field values should be treated as though they don't exist at
all, which means that proposals where a forbidden field's value matches
a given search query should NOT be returned as a result of that match.

Issue #1621 Add the concept of "too sensitive to store data" to base fields
If the BaseField associated with an ApplicationFormField is marked as
forbidden we do not allow new data to be inserted for that field.
This commit enforces that requirement and returns a 400 error if a user
attempts to create a proposal version that violates the constraint.

Issue #1621 Add the concept of "too sensitive to store data" to base fields
The `POST /applicationForms` handler created multiple entities, but if
the creation of the actual form fields failed the application form would
still exist but in a partial state.  This is incorrect, and the whole
thing should be part of a transaction so that if any step fails there is
no change in data state.
If a `BaseField` is marked as forbidden then we do not want to allow
form fields to attempt to collect data for that base field.

Issue #1621 Add the concept of "too sensitive to store data" to base fields
If a `BaseField` has a `FORBIDDEN` `sensitivityClassification` then any
existing `ApplicationFormFields` that use that `BaseField` should not be
returned when loading `ApplicationForms`.

Eventually there will also be a mechanism to purge those forbidden form
fields, but for now we will merely stop exposing them.

Issue #1621 Add the concept of "too sensitive to store data" to base fields
We never want to return `FORBIDDEN` ProposalFieldValues. The primary way
to accomplish that is to ensure our queries appropriately condition
against selecting those values in the first place.

This commit is a "belt and suspenders" approach which is only taken on
because of the level of sensitivity that this date represents.  If
a query attempts to select a forbidden ProposalFieldValue it will result
in a database exception.

This doesn't completely prevent access (e.g. if a query accesses field
values without creating an entire row), but it is still a layer of
defense.

Issue #1621 Add the concept of "too sensitive to store data" to base fields
@slifty slifty force-pushed the 1621-add-sensitivity-classification branch from f87d08c to ddc2e2b Compare June 9, 2025 19:49
@slifty slifty marked this pull request as ready for review June 9, 2025 19:49
@slifty slifty requested review from bickelj and hminsky2002 June 9, 2025 19:49
Copy link
Contributor

@bickelj bickelj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having only skimmed the code, I first went to try it locally.

  1. I synced base fields from prod (hat tip to @hminsky2002 in Create worker to sync basefields #1259 which worked perfectly first try).
  2. I created boilerplate funders, associated opportunities, sources, and proposals.
  3. I created an application form with three fields, all at the time restricted.
  4. I posted a proposalVersion with all three fields filled in, no problem.
  5. In psql I updated one of those base fields to be forbidden then re-posted the same proposalVersion.

I successfully got an error, but I doubt this is the error we want. I think a 400 Bad Request (or some 4xx) would make more sense rather than 500 Internal Server Error.

Looking a little closer, this may be an issue with audit log code or something in the error throwing/handling between this PR and that code:

{
  "level": 50,
  "time": 1749568326472,
  "pid": 49507,
  "hostname": "host",
  "source": "/home/user/code/ots/ext/PhilanthropyDataCommons/service/dist/middleware/errorHandler.js",
  "wrappedError": {
    "tinyPgError": {
      "name": "TinyPgError",
      "queryContext": {
        "id": "9ee0586f-064f-4d29-ae72-db516c94acbf",
        "sql": "INSERT INTO service_query_audit_logs (\n\tuser_keycloak_user_id,\n\tuser_is_administrator,\n\tquery_name,\n\tquery_parameters\n) VALUES (\n\t$1,\n\t$2,\n\t$3,\n\t$4\n);\n",
        "start": 1749568326469,
        "name": "serviceQueryAuditLogs_insertOne",
        "params": {
          "authContextKeycloakUserId": "8f6e6dd9-d4af-45db-af50-712f7e962cd7",
          "authContextIsAdministrator": true,
          "queryName": "proposalFieldValues.insertOne",
          "queryParameters": {
            "authContextKeycloakUserId": "8f6e6dd9-d4af-45db-af50-712f7e962cd7",
            "authContextIsAdministrator": true,
            "proposalVersionId": 3,
            "applicationFormFieldId": 4,
            "position": 50,
            "value": "Call me Ishmael",
            "isValid": true,
            "goodAsOf": "2025-06-10T15:10:13.673Z"
          }
        },
        "submit": 1749568326471,
        "wait_duration": 2,
        "active_duration": 0,
        "end": 1749568326471,
        "duration": 2,
        "error": {
          "length": 145,
          "name": "error",
          "severity": "ERROR",
          "code": "25P02",
          "file": "postgres.c",
          "line": "1498",
          "routine": "exec_parse_message"
        },
        "data": null
      }
    },
    "name": "DatabaseError"
  },
  "statusCode": 500
}

The postgres log says:

2025-06-10 10:12:06.470 CDT [51454] 68484474.c8fe pdc@pdc 127.0.0.1(58736) ERROR:  Cannot insert proposal field value for forbidden application form field 6
2025-06-10 10:12:06.470 CDT [51454] 68484474.c8fe pdc@pdc 127.0.0.1(58736) CONTEXT:  PL/pgSQL function prevent_inserting_forbidden_proposal_field_value() line 14 at RAISE
2025-06-10 10:12:06.470 CDT [51454] 68484474.c8fe pdc@pdc 127.0.0.1(58736) STATEMENT:  INSERT INTO proposal_field_values (
		proposal_version_id,
		application_form_field_id,
		value,
		position,
		is_valid,
		good_as_of
	) VALUES (
		$1,
		$2,
		$3,
		$4,
		$5,
		$6
	)
	RETURNING proposal_field_value_to_json(proposal_field_values) AS object;
	
2025-06-10 10:12:06.471 CDT [51454] 68484474.c8fe pdc@pdc 127.0.0.1(58736) ERROR:  current transaction is aborted, commands ignored until end of transaction block
2025-06-10 10:12:06.471 CDT [51454] 68484474.c8fe pdc@pdc 127.0.0.1(58736) STATEMENT:  INSERT INTO service_query_audit_logs (
		user_keycloak_user_id,
		user_is_administrator,
		query_name,
		query_parameters
	) VALUES (
		$1,
		$2,
		$3,
		$4
	);
	
2025-06-10 10:12:06.471 CDT [51454] 68484474.c8fe pdc@pdc 127.0.0.1(58736) ERROR:  current transaction is aborted, commands ignored until end of transaction block
2025-06-10 10:12:06.471 CDT [51454] 68484474.c8fe pdc@pdc 127.0.0.1(58736) STATEMENT:  INSERT INTO service_query_audit_logs (
		user_keycloak_user_id,
		user_is_administrator,
		query_name,
		query_parameters
	) VALUES (
		$1,
		$2,
		$3,
		$4
	);
	

The GET /proposals endpoint successfully returned the existing proposal and filtered out the verboten field.

After that, I added a Changemaker and linked all the proposals to that new Changemaker. The following endpoints successfully filtered out the forbidden field:

  • GET /changemakers
  • GET /changemakerProposals
  • GET /proposals
  • GET /proposals/{proposalId}
  • GET /proposalVersions
  • GET /proposalVersions/{proposalVersionId}

So far this is looking really really good.

@slifty
Copy link
Member Author

slifty commented Jun 10, 2025

@bickelj oh interestin' -- it looks like the audit logs maybe are eating the check error (which should result in the 400 as you mention).

I wonder why the test I wrote isn't triggering that behavior...

Copy link
Contributor

@bickelj bickelj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks and works great. It achieves our purpose: forbid field values associated with base fields marked forbidden. There is a minor issue with exception propagation in the case of posting a forbidden field value but that should not prevent merging this PR.

@@ -5,11 +5,12 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## Unreleased
## 0.19.0 2025-06-03
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to 2025-06-10 (or 2025-06-11) before merge.

@@ -333,6 +340,49 @@ describe('/applicationForms', () => {
});
});

it('does not return formFields associated with `FORBIDDEN` BaseFields', async () => {
const systemFunder = await loadSystemFunder(db, null);
await createOpportunity(db, null, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocking comment but for new tests I prefer to capture the result of the create such that I can reliably get the ID instead of assuming it's 1 in the rest of the test. We could conceivably run the tests concurrently on the same DB if all our tests did this. Even if we did that, we'd probably want to run them in sequence when they fail, to see any difference. Anyway, again, not a show-stopper at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes great point yes -- I'll fix these tests up to reflect that better approach

PUBLIC = 'public',
RESTRICTED = 'restricted',
FORBIDDEN = 'forbidden',
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, and these are perfect starting values!

FROM application_form_fields
WHERE application_form_fields.id = proposal_field_value.application_form_field_id;
SELECT EXISTS (
select 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is supposed to be a SELECT but maybe SQLfluff hath failed us again here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good catch -- (fluff doesn't lint anything inside a function unfortunately :()


ALTER TABLE base_fields
ADD COLUMN sensitivity_classification sensitivity_classification
DEFAULT 'restricted';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed the right default.

@@ -20,9 +27,11 @@ WHERE
OR :search = ''
) THEN
TRUE
ELSE
proposal_field_values.value_search
ELSE (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is part of a series of AND clauses, but could it be out top as its own clause within the WHERE? Or even an AND on the LEFT JOIN? Doubt it makes any difference query-planner-wise but may be more readable.

.status(201)
.contentType('application/json')
.send({
const finalApplicationForm = await db.transaction(async (transactionDb) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice use of transaction here.

@@ -48,6 +52,8 @@
"description",
"dataType",
"scope",
"valueRelevanceHours",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add the concept of "too sensitive to store data" to base fields
2 participants