Skip to content

C++: Speed up cpp/leap-year/unchecked-after-arithmetic-year-modification#21768

Open
MathiasVP wants to merge 1 commit intomainfrom
speed-up-unchecked-leap-year-after-modification
Open

C++: Speed up cpp/leap-year/unchecked-after-arithmetic-year-modification#21768
MathiasVP wants to merge 1 commit intomainfrom
speed-up-unchecked-leap-year-after-modification

Conversation

@MathiasVP
Copy link
Copy Markdown
Contributor

This PR fixes a performance problem in cpp/leap-year/unchecked-after-arithmetic-year-modification which was caused by #21292.

The problem can be seen in this partial run:

[2026-04-28 05:55:08] Evaluated non-recursive predicate UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::S6Graph::localStep/9#f7e3c284@74ab34gm in 3319052ms (size: 310706007).
Evaluated relational algebra for predicate UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::S6Graph::localStep/9#f7e3c284@74ab34gm with tuple counts:
  260207923   ~6%    {7} r1 = SCAN num#UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::S6Graph::TPathNodeMid#586d6f71 OUTPUT In.0, In.3, In.4, In.5, In.1, In.2, In.6
                  
      582279   ~5%    {9} r2 = JOIN r1 WITH `UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::Stage6::fwdFlowRead/11#743a499a_01239105678#join_rhs` ON FIRST 6 OUTPUT Lhs.6, Rhs.6, Lhs.4, Lhs.5, Rhs.7, Rhs.8, Rhs.9, _, _
      582279   ~0%    {9}    | REWRITE WITH Out.7 := "", Out.8 := false
                  
    9407058   ~3%    {8} r3 = JOIN r1 WITH `UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::Stage6::fwdFlowStore/10#7594007a_0123894567#join_rhs` ON FIRST 6 OUTPUT Rhs.6, Lhs.2, Lhs.4, Lhs.5, Lhs.6, Rhs.7, Rhs.8, Rhs.9
    7512614   ~5%    {9}    | JOIN WITH `UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::AccessPath.isCons/2#dispred#1f6e207a_120#join_rhs` ON FIRST 2 OUTPUT Lhs.4, Lhs.7, Lhs.2, Lhs.3, Lhs.5, Rhs.2, Lhs.6, _, _
    7512614   ~3%    {9}    | REWRITE WITH Out.7 := "", Out.8 := true
                  
  260207923   ~4%    {7} r4 = SCAN num#UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::S6Graph::TPathNodeMid#586d6f71 OUTPUT In.1, In.0, In.2, In.3, In.4, In.5, In.6
  260207923   ~4%    {7}    | JOIN WITH `DataFlowImplCommon::CallContextSensitivity<UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::Stage6Param::CallContextSensitivityInput>::CallContext#1201065c` ON FIRST 1 OUTPUT Lhs.0, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5, Lhs.6
                  
  260203381   ~4%    {7} r5 = r4 AND NOT `DataFlowImplCommon::CallContextSensitivity<UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::Stage6Param::CallContextSensitivityInput>::PrunedViableImpl<DefaultPrunedViableImplInput>::LocalCallContext::getUnreachable/1#4e807359_0#antijoin_rhs`(FIRST 1)
  260203381   ~0%    {8}    | JOIN WITH cached_DataFlowImplCommon::Cached::TAnyLocalCall#c01dcce5 CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5, Lhs.6
  260203381   ~0%    {8}    | JOIN WITH DataFlowImplCommon::LocalCallContext#6167e342 ON FIRST 1 OUTPUT Lhs.2, Lhs.1, Lhs.3, Lhs.4, Lhs.5, Lhs.6, Lhs.7, Lhs.0
                  
        4542   ~0%    {8} r6 = JOIN r4 WITH `DataFlowImplCommon::CallContextSensitivity<UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::Stage6Param::CallContextSensitivityInput>::PrunedViableImpl<DefaultPrunedViableImplInput>::LocalCallContext::getUnreachable/1#4e807359` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.0, Lhs.2, Lhs.3, Lhs.4, Lhs.5, Lhs.6
        4542   ~0%    {8}    | JOIN WITH cached_DataFlowImplCommon::Cached::TSpecificLocalCall#05dd466c ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5, Lhs.6, Lhs.7
        4542   ~0%    {8}    | JOIN WITH DataFlowImplCommon::LocalCallContext#6167e342 ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5, Lhs.6, Lhs.7, Lhs.0
                  
  260207923   ~0%    {8} r7 = r5 UNION r6
  260207923   ~1%    {8}    | JOIN WITH num#UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::S6Graph::TPathNodeMid#586d6f71 ON FIRST 7 OUTPUT Lhs.0, Lhs.7, Lhs.1, Lhs.2, Lhs.4, Lhs.5, Lhs.6, Lhs.3
  302370000   ~0%    {9}    | JOIN WITH `project#UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::Stage5::LocalFlowBigStep<Stage4::LocalFlowBigStep<Stage3Param::localFlowBigStep>::localFlowBigStepTc>::localFlowBigStepTc/6#58c31154#5_0213#join_rhs` ON FIRST 2 OUTPUT Lhs.6, Rhs.2, Lhs.2, Lhs.3, Lhs.7, Lhs.4, Lhs.5, Rhs.3, _
  302370000   ~0%    {9}    | REWRITE WITH Out.8 := false
                  
  260207923   ~3%    {6} r8 = SCAN num#UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::S6Graph::TPathNodeMid#586d6f71 OUTPUT In.1, In.0, In.2, In.4, In.5, In.6
  260207923   ~3%    {6}    | JOIN WITH `DataFlowImplCommon::CallContextSensitivity<UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::Stage6Param::CallContextSensitivityInput>::CallContext#1201065c` ON FIRST 1 OUTPUT Lhs.0, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5
                  
  260203381   ~3%    {6} r9 = r8 AND NOT `DataFlowImplCommon::CallContextSensitivity<UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::Stage6Param::CallContextSensitivityInput>::PrunedViableImpl<DefaultPrunedViableImplInput>::LocalCallContext::getUnreachable/1#4e807359_0#antijoin_rhs`(FIRST 1)
  260203381   ~2%    {7}    | JOIN WITH cached_DataFlowImplCommon::Cached::TAnyLocalCall#c01dcce5 CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5
  260203381   ~0%    {7}    | JOIN WITH DataFlowImplCommon::LocalCallContext#6167e342 ON FIRST 1 OUTPUT Lhs.4, Lhs.2, Lhs.1, Lhs.3, Lhs.5, Lhs.6, Lhs.0
                  
        4542   ~1%    {7} r10 = JOIN r8 WITH `DataFlowImplCommon::CallContextSensitivity<UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::Stage6Param::CallContextSensitivityInput>::PrunedViableImpl<DefaultPrunedViableImplInput>::LocalCallContext::getUnreachable/1#4e807359` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.0, Lhs.2, Lhs.3, Lhs.4, Lhs.5
        4542   ~0%    {7}    | JOIN WITH cached_DataFlowImplCommon::Cached::TSpecificLocalCall#05dd466c ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5, Lhs.6
        4542   ~0%    {7}    | JOIN WITH DataFlowImplCommon::LocalCallContext#6167e342 ON FIRST 1 OUTPUT Lhs.4, Lhs.1, Lhs.2, Lhs.3, Lhs.5, Lhs.6, Lhs.0
                  
  260207923   ~0%    {7} r11 = r9 UNION r10
    2115749   ~0%    {8}    | JOIN WITH num#UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::TAccessPathNil#dc6d0df5 ON FIRST 1 OUTPUT Lhs.1, _, Lhs.6, Lhs.2, Lhs.3, Lhs.0, Lhs.4, Lhs.5
    2115749   ~5%    {8}    | REWRITE WITH Out.1 := false
      251639   ~0%    {9}    | JOIN WITH `UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::Stage5::LocalFlowBigStep<Stage4::LocalFlowBigStep<Stage3Param::localFlowBigStep>::localFlowBigStepTc>::localFlowBigStepTc/6#58c31154_024135#join_rhs` ON FIRST 3 OUTPUT Lhs.7, Rhs.3, Lhs.3, Lhs.4, Rhs.4, Lhs.5, Lhs.6, Rhs.5, _
      251639   ~0%    {9}    | REWRITE WITH Out.8 := false
                  
  310716532   ~0%    {9} r12 = r2 UNION r3 UNION r7 UNION r11
                      return r12

The underlying problem is that the dataflow computation specified by IgnorableOperationToOperationSourceCandidateConfig has way too many sources and sinks. So even though it is restricted to flows where the source and the sink are in the same enclosing callable there is way too many PathNodes.

To fix this problem we modify the ordering of dataflows. On main we do (ignoring some of the other dataflow configurations used in this query):

  1. An initial dataflow computation specified by IgnorableOperationToOperationSourceCandidateConfig to identify operations to ignore
  2. Use the result from (1) to filter out certain OperationSource, which then serve as the sources for the "final" dataflow computation specified by OperationToYearAssignmentConfig.

After this PR we do:

  1. An initial dataflow computation OperationToYearAssignmentConfig0 (which is equal to the original OperationToYearAssignmentConfig, but without the filtering done by IgnorableOperationToOperationSourceCandidateConfig)
  2. Another dataflow computation specified by IgnorableOperationToOperationSourceCandidateConfig. This is the same computation as we do on main, but now restricted to only relevant sinks (which we get from (1))
  3. A "final" dataflow computation specified by OperationToYearAssignmentConfig (which is equal to OperationToYearAssignmentConfig0, but with the restricted set of sinks from (2))

On the particular database I was testing on this now completes "trivially" (since sentinals now detect that the query will have 0 results since UncheckedLeapYearAfterYearModification::OperationToYearAssignmentFlow::Flow::S6::flowPath/2#9b05e974/2 is empty).

…er an initial round of the query to reduce the number of sinks.
Copilot AI review requested due to automatic review settings April 28, 2026 21:04
@MathiasVP MathiasVP requested a review from a team as a code owner April 28, 2026 21:04
@github-actions github-actions Bot added the C++ label Apr 28, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a severe performance regression in the C++ query cpp/leap-year/unchecked-after-arithmetic-year-modification by restructuring the query’s dataflow computations to drastically reduce the number of sources/sinks (and resulting PathNodes) considered during evaluation.

Changes:

  • Introduces an initial “pre-filter” dataflow (OperationToYearAssignmentConfig0 / OperationToYearAssignmentFlow0) to identify only operation-source candidates that can reach a year assignment.
  • Restricts the sinks of IgnorableOperationToOperationSourceCandidateConfig to only those operation-source candidates that are relevant to year assignments.
  • Refactors the final OperationToYearAssignmentConfig to use the restricted set and apply the ignorable-operation filtering in a later stage.
Show a summary per file
File Description
cpp/ql/src/Likely Bugs/Leap Year/UncheckedLeapYearAfterYearModification.ql Reorders and splits dataflow configurations to constrain sink/source sets and improve evaluation performance.

Copilot's findings

  • Files reviewed: 1/1 changed files
  • Comments generated: 1


/**
* The set of all expressions which is a candidate expression and also does not flow from to to some ignorable expression (eg. bitwise op)
* The set of all expressions which is a candidate expression.
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment for OperationSource is grammatically incorrect/unclear (“The set of all expressions which is a candidate expression.”). Consider rephrasing to a plural form (for example, “The set of all expressions that are operation source candidates.”) so it reads cleanly and matches what the class represents.

Suggested change
* The set of all expressions which is a candidate expression.
* The set of all expressions that are operation source candidates.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* The set of all expressions which is a candidate expression.
* The set of all expressions that are candidate expression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants