Skip to content

[Enhancement] Removes reserve() from array_agg(). #56958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 17, 2025

Conversation

jkim650
Copy link
Contributor

@jkim650 jkim650 commented Mar 14, 2025

Why I'm doing:

While debugging a slow query which took a few minutes, we found that it was due to skewed data.
But it was not clear why the skewed data caused such a long latency. After further investigation, I found that it was due to unnecessary reserve() calls in array_agg().

After appending a large array to a result column in one aggregated row, following finalize_to_column()s even with a small number of elements took a few ms likely to allocate new memory and copy existing values. Thousands of aggregated rows with a few ms added a lot of latency.
Repeatedly calling reserve() with small increases is harmful. Without reserve(), in the following append calls, std::vector would be able to increase the capacity exponentially efficiently.

What I'm doing:

Removes reserve().

In a test on skewed data, a latency of a query with array_agg() decreased from 2m45s to 21s.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.4
    • 3.3
    • 3.2
    • 3.1
    • 3.0

@jkim650 jkim650 requested a review from a team as a code owner March 14, 2025 17:31
Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

@alvin-celerdata
Copy link
Contributor

@jkim650 thank you for this PR, could you sign-off this PR by git commit -s

@stdpain stdpain merged commit 3757748 into StarRocks:main Mar 17, 2025
57 of 64 checks passed
@chaoyli
Copy link
Contributor

chaoyli commented Mar 17, 2025

@mergify backport branch-3.4 branch3.3

Copy link
Contributor

mergify bot commented Mar 17, 2025

backport branch-3.4 branch3.3

❌ No backport have been created

GitHub error: Branch not found

@chaoyli
Copy link
Contributor

chaoyli commented Mar 17, 2025

@mergify backport branch-3.3

Copy link
Contributor

mergify bot commented Mar 17, 2025

backport branch-3.3

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Mar 17, 2025
mergify bot pushed a commit that referenced this pull request Mar 17, 2025
crossoverJie pushed a commit to crossoverJie/starrocks that referenced this pull request Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants