Skip to content

[Feature branch] JS: Migrate to shared dataflow library #14412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 226 commits into from
Aug 2, 2024

Conversation

asgerf
Copy link
Contributor

@asgerf asgerf commented Oct 9, 2023

Targeting a feature branch

This PR is targeting a branch named js/shared-dataflow-branch, not main as we normally would.

  • Merging into main will happen at a later point, and will require that
    • we have a good-looking DCA report (currently we have too many performance regressions), and
    • possibly more evaluations on different suites and/or a QA trial run, and
    • TODO comments have been resolved
  • The criteria for merging to js/shared-dataflow-branch should be that the code has been reviewed, and is considered good enough to go into main once the above criteria are met.
  • This means things like readability should be reviewed as if we were targeting main. We don't want to have to go back a re-review the code for readability later.

Review order

Commits have been organised into a reading order. The large number of commits is there to make it easier to the review.

Note that not every commit can compile in isolation. I split some commits into smaller chunks for easy of readability, but you may sometimes see odd artifacts like see a trailing and/or, an empty predicate body, or the occasional mention of a class that doesn't exist yet.

Otherwise the overarching order of commits are:

  1. The first commit is from this PR, which should be reviewed in its own PR.
  2. Core data flow changes: instantiating shared libraries and related work to enable that
  3. Library modeling changes: mainly rewriting collection models as flow summaries
  4. Query changes: porting to ConfigSig-style and updating tests
  5. Library tests: port test configurations to ConfigSig style, generic test output changes
  6. Add TODOs and further attempts at fixing some of the performance and precision issues.

Steps across function boundaries

The most difficult change for JS is that steps across function boundaries are no longer permitted, except for jump steps which can result in FPs and performance issues.

This means the following type of steps need to be rewritten:

  • Steps into or out of callbacks. This applies equally to value-preversing steps, taint steps, read steps, and store steps.
  • Store steps that target getALocalSource(), because that can cross function boundaries.
  • Steps that use the combined loadStoreStep feature, which is no longer supported.

The current status of this migration is:

  • Ported the core models for Array, Map, Set, Promise, and generators.
  • The above took care of all our existing uses of loadStoreStep.
  • String#replace has been converted, as it is the only string-related step that needed to be ported.
  • Value steps and taint steps from other models are implicitly converted to jump steps when they cross a function boundary.
  • Read/store steps from other models are just retained as they are for now, which in some case breaks the invariant by crossing a function boundary, as there is no combined jump-read or jump-store step we can map to.

Related reading:

Evaluation

The latest evaluation can be found here. As mentioned there are too many performance regressions at the moment, and alerts need to be triaged.

asgerf added 14 commits June 25, 2024 13:30
In preparation for migrating to the FlowSummary module in the qlpack,
rename predicates to be consistent with the qlpack.
This is identical to the code in Ruby and seems to prevent a bad join ordering
in a cached version of this predicate in DataFlowCommon
Only includes the changes that purely contain the new provenance columns
New library gets FN for spread arguments in a call to splice(), which
was added to the old version in this PR:
  github#16739
@asgerf asgerf force-pushed the js/shared-dataflow branch from 233b791 to af7b4e3 Compare June 26, 2024 11:53
asgerf added 11 commits June 27, 2024 09:06
These initially got messed up by a merge conflict where I couldn't rerun the tests due to breaking
changes in the data flow library. I wanted the breaking-change updates to live in their own commits,
not just eaten by a merge resolution commit, so the test output became broken for a while.

The '#select' result set is unchanged in all of these, so they should be safe to accept.
@asgerf
Copy link
Contributor Author

asgerf commented Jun 28, 2024

@erik-krogh after keeping up the CI changes and breaking changes in the data flow library, this PR is finally green again (apart from the QLDoc check which complains about shared libraries).

I'd like to merge it to the feature branch as it is now, so we can start working with smaller PRs. The first unreviewed commit so far is 102ca77.

Copy link
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ready for a feature branch 👍

Comment on lines +200 to +204
int totalorder() {
result = TotalOrdering::astNodeId(this.asSourceCallable()).bitShiftLeft(1)
or
result = TotalOrdering::libraryCallableId(this.asLibraryCallable()).bitShiftLeft(1) + 1
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the left-shift?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the IDs for AST nodes and library callables can't clash, because the low bit is zero for AST nodes and 1 for library callables.

@@ -1,11 +1,11 @@
legacyDataFlowDifference
| arrays.js:2:16:2:23 | "source" | arrays.js:39:8:39:24 | arr4_spread.pop() | only flow with OLD data flow library |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to add this step to the new library?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to fix this missing flow by adding general handling of the spread operator, so individual models don't have to specify how to handle spread operators. For example if a model specifies Argument[1] somewhere, and it applies to call that has a spread argument at position 0 or 1, then it should Just Work. The model shouldn't have to specify something like Argument[spread] which is what the legacy model currently does.

@asgerf asgerf merged commit 5c222f7 into github:js/shared-dataflow-branch Aug 2, 2024
12 of 13 checks passed
@asgerf
Copy link
Contributor Author

asgerf commented Aug 2, 2024

Merging to the feature branch (not main)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants