Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement vectorized decompression for bools #7899

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dbeck
Copy link
Contributor

@dbeck dbeck commented Apr 1, 2025

This change decompresses into BitArray and add the supporting code into Simple8bRleBitArray. With this the ArrowArray stores bits and not bytes. A new DecompressionType was added: DT_ArrowBits to inform the decompression logic about the data format.

The TSL planner is also modified so it generates the vectorized decompression and filtering ops to take advantage of these. The Postgres planner creates plan nodes for boolean columns that we didn't handle before. For BoolCol = True it reduces the expression to a single Var, while for BoolCol = False it creates a BoolExpr(Not, Var) node. With this change we handle both cases.

@dbeck dbeck requested a review from akuzm April 1, 2025 17:13
@dbeck dbeck force-pushed the vectorized-bool-compression-3 branch 2 times, most recently from 521c902 to 10771d9 Compare April 1, 2025 20:41
@philkra philkra added this to the v2.20.0 milestone Apr 2, 2025
@dbeck dbeck force-pushed the vectorized-bool-compression-3 branch 2 times, most recently from 6178b9a to 96075ac Compare April 2, 2025 10:00
Copy link

codecov bot commented Apr 2, 2025

Codecov Report

Attention: Patch coverage is 81.67539% with 35 lines in your changes missing coverage. Please review.

Project coverage is 81.85%. Comparing base (59f50f2) to head (9af059f).
Report is 868 commits behind head on main.

Files with missing lines Patch % Lines
...src/compression/algorithms/simple8b_rle_bitarray.h 63.46% 1 Missing and 18 partials ⚠️
tsl/src/nodes/decompress_chunk/compressed_batch.c 79.41% 0 Missing and 7 partials ⚠️
tsl/test/src/compression_unit_test.c 54.54% 0 Missing and 5 partials ⚠️
tsl/src/compression/algorithms/bool_compress.c 93.33% 0 Missing and 2 partials ⚠️
src/adts/vec.h 85.71% 0 Missing and 1 partial ⚠️
tsl/src/nodes/decompress_chunk/planner.c 90.90% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7899      +/-   ##
==========================================
+ Coverage   80.06%   81.85%   +1.78%     
==========================================
  Files         190      250      +60     
  Lines       37181    46343    +9162     
  Branches     9450    11637    +2187     
==========================================
+ Hits        29770    37935    +8165     
- Misses       2997     3808     +811     
- Partials     4414     4600     +186     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dbeck dbeck force-pushed the vectorized-bool-compression-3 branch 11 times, most recently from d299691 to 00c9ffb Compare April 2, 2025 21:26
Comment on lines +321 to +323
(.state_code == "XX000"
and .error_severity != "LOG"
and (.statement | contains("ignore this error in CI") | not))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unrelated change

Copy link
Contributor Author

@dbeck dbeck Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to the change below. We do need to ignore this error message, because it is debug/test only and only used to verify the vectorized filtering works.

And yes, unrelated to the 'vectorized bool filtering'. It fixes a problem that we already had. I have no idea why it didn't cause issues before.

\set ON_ERROR_STOP 0
set timescaledb.debug_require_vector_qual to 'forbid';
select count(*) from vectorqual where metric4 > 4;
select count(*), 'ignore this error in CI' as x from vectorqual where metric4 > 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

Copy link
Contributor Author

@dbeck dbeck Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI collects the error messages and is unhappy seeing the one generated by set timescaledb.debug_require_vector_qual to 'forbid'; because it thinks this is a new internal error message being introduced.

I was thinking either add this debug/test error message to the list of error messages or ignore it, as this is debug/test only. I think it is better to ignore it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is because that code uses elog, i would prefer adjusting the code to not use elog and have proper sql state code instead of filtering this out

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe ERRCODE_QUERY_CANCELED would be more appropriate here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the purpose of this check is that it should be impossible to trigger the ERRCODE_INTERNAL_ERROR during normal program operation. It is reserved for bugs or catalog corruption.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE, the ERRCODE_QUERY_CANCELED is for operator intervention.

@dbeck dbeck force-pushed the vectorized-bool-compression-3 branch 4 times, most recently from 3c1d766 to 5fd8c2c Compare April 4, 2025 19:57
This change decompresses into BitArray and add the supporting
code into Simple8bRleBitArray. With this the ArrowArray stores
bits and not bytes. A new DecompressionType was added: DT_ArrowBits
to inform the decompression logic about the data format.

The TSL planner is also modified so it generates the vectorized
decompression and filtering ops to take advantage of these. The
Postgres planner creates plan nodes for boolean columns that we
didn't handle before. For BoolCol = True it reduces the expression
to a single Var, while for BoolCol = False it creates a
BoolExpr(Not, Var) node. With this change we handle both cases.
@dbeck dbeck force-pushed the vectorized-bool-compression-3 branch from 5fd8c2c to 9af059f Compare April 4, 2025 20:58
@dbeck dbeck enabled auto-merge (rebase) April 5, 2025 15:05
@dbeck dbeck disabled auto-merge April 5, 2025 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants