Skip to content

Dataflow: Refactor stage 6 to use shared stage code. #17105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Aug 22, 2024

Conversation

aschackmull
Copy link
Contributor

@aschackmull aschackmull commented Jul 31, 2024

This extends the StagePathGraph and corresponding nodes to fully match the existing PathGraph and PathNode. With that in place, the existing stage 6 code is replaced by an instantiation of the Stage module to reduce code duplication.

Things to note:

  • Proper checking of clearsContent on store targets used to be postponed to stage 6, so this needed to be added to the StagePathGraph as well. Given that we now base the flow computation on node-reachability instead of edge-reachability, this means that we'll need to do a final forward reachability check to prune PathNode in order to avoid unreachable edges, since an edge using reachable nodes might be reverse-reachable from a sink without being forward-reachable.
  • Stateful in/out-barriers used to only be checked in the pathStep predicate, so in order to continue to support this, the check has been pushed into the general stage computation.
  • Provenance for flow-through edges no longer need to rely on additional information stored in PathNodes (the summaryLabel), but can instead be computed in a separate predicate. This gets rid of a lot of superfluous PathNodes (as can be seen in several qltests).
  • There's a small change to how we track a summary context due to differences between the prior implementation in pathStep and the current implementation in the stage modules. The latter includes the call site to determine whether flow-through is possible and hence whether to construct a summary context. For certain cases where a call target can have both flow-through from one call-site and flow-but-not-through from another call-site, this can result in a few additional PathNodes. This can be seen in the changes to the C++ qltest expected output.
  • We get a few additional subpaths when detecting subpath-wrappers in hidden code in the case when a flow-through path relies on two (or more) chained nested flow-through steps. This shows in a couple of qltests for some of the more involved library models, e.g. the C# Linq Aggregate method, which chains its data through two callbacks.

Commit-by-commit review is strongly encouraged. I made effort to split the changes into separately reviewable commits.

Fixes https://github.com/github/codeql-team/issues/3234

@aschackmull aschackmull changed the title Dataflow/stage6 Dataflow: Refactor stage 6 to use shared stage code. Aug 21, 2024
@aschackmull aschackmull marked this pull request as ready for review August 21, 2024 09:37
@aschackmull aschackmull requested review from a team as code owners August 21, 2024 09:37
@aschackmull
Copy link
Contributor Author

This change actually gets us a tiny precision improvement: Previously, the reachability in pathStep relied on consCand (aka push/pop) relation that was materialised up front, which meant that it used the output of Stage 5. But now, stage 6 is mutually recursive with its own cons-candidate relation. When access paths are fully precise (which they often are in stage 6) then this doesn't make any difference, but when stage 6 access paths are approximated, then this can make a small difference. This accounts for the removal of a few FPs highlighted in C# dca.

@aschackmull aschackmull added the no-change-note-required This PR does not need a change note label Aug 21, 2024
@geoffw0
Copy link
Contributor

geoffw0 commented Aug 21, 2024

CPP changes and DCA LGTM (as does Swift DCA). 👍

Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for making this refactoring, that makes the code much more maintainable.

Comment on lines 3179 to 3180
not this instanceof StagePathNodeSrcGrp and
not this instanceof StagePathNodeSinkGrp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you could define private class TPathNode = TPathNodeMid or TPathNodeSink and then add TPathNode to the extent of PathNode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wouldn't remove the need for a charpred, so I don't think it would gain much - TPathNode would still be larger than PathNode.

@aschackmull aschackmull merged commit d97a301 into github:main Aug 22, 2024
41 of 42 checks passed
@aschackmull aschackmull deleted the dataflow/stage6 branch August 22, 2024 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants