Skip to content

[Bug]: Wrong result returned after compressing column, with alter table, new column with default value #7714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dbeck opened this issue Feb 14, 2025 · 1 comment · Fixed by #7798
Assignees
Labels

Comments

@dbeck
Copy link
Member

dbeck commented Feb 14, 2025

What type of bug is this?

Incorrect result

What subsystems and features are affected?

Compression

What happened?

I'm creating this bug after Sven Klemm found the issue and Alexander Kuzmenkov provided a further repro.
In the two repros we:

  • create a hypertable
  • enable compression
  • insert a record
  • alter the table by adding a new column with a default value
  • update the table with a null value / insert a record with a null value
  • compress

After these steps we expect the compressed table to return the null value, but instead it returns the default value.

TimescaleDB version affected

2.18.0

PostgreSQL version used

16.6

What operating system did you use?

Ubuntu 22.04 x86

What installation method did you use?

Source

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

How can we reproduce the bug?

--
-- Repro 1: update value
-- Credit: Sven Klemm
--
create table t1 (ts int);
select create_hypertable('t1', 'ts');
alter table t1 set(timescaledb.compress);
insert into t1 (ts) values (1);
alter table t1 add column c1 int default 42;
update t1 set c1 = null;
select compress_chunk(show_chunks('t1'));
select * from t1;
 ts | c1 
----+----
  1 | 42
(1 row)

--
-- Repro 2: insert value
-- Credit: Alexander Kuzmenkov
-- 
create table ttt(ts int);
select create_hypertable('ttt', 'ts');
insert into ttt values (1);
alter table ttt set (timescaledb.compress, timescaledb.compress_segmentby = 'ts');
select compress_chunk(show_chunks('ttt'));
alter table ttt add column a int default 7;
insert into ttt values (2, null);
set timescaledb.enable_segmentwise_recompression to off;
select compress_chunk(show_chunks('ttt'));
select * from ttt;

 ts │ a 
────┼───
  1 │ 7
  2 │ 7
@dbeck dbeck added the bug label Feb 14, 2025
@dbeck dbeck self-assigned this Feb 14, 2025
@dbeck
Copy link
Member Author

dbeck commented Feb 14, 2025

Potentially related issue: when I alter-added the new column I set the default value to 42, and then I altered it to 99 and 33 at last. Somehow the 42 came back after compression.

# alter table t add column c2 set default 42;
# alter table t alter column c2 set default 99;
# insert into t (ts) values (1);
INSERT 0 1
dbeck=# select * from t;
 ts | c1 | c2 
----+----+----
  1 | 43 | 99
(1 row)

# alter table t alter column c2 set default 33;
# insert into t (ts) values (2);
# select * from t;
 ts | c1 | c2 
----+----+----
  1 | 43 | 99
  2 | 43 | 33
(2 rows)

# select compress_chunk(show_chunks('t'));
# select * from t;
 ts | c1 | c2 
----+----+----
  2 | 43 | 33
  1 | 43 | 99
(2 rows)

# update t set c1=null, c2=null;
# select * from t;
 ts | c1 | c2 
----+----+----
  2 |    |   
  1 |    |   
(2 rows)

# select compress_chunk(show_chunks('t'));
# select * from t;
 ts | c1 | c2 
----+----+----
  2 |    | 42
  1 |    | 42
(2 rows)

# \d+ t
                                            Table "public.t"
 Column |  Type   | Collation | Nullable | Default | Storage | Compression | Stats target | Description 
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
 ts     | integer |           | not null |         | plain   |             |              | 
 c1     | integer |           |          | 43      | plain   |             |              | 
 c2     | integer |           |          | 33      | plain   |             |              | 
Indexes:
    "t_ts_idx" btree (ts DESC)
Triggers:
    ts_insert_blocker BEFORE INSERT ON t FOR EACH ROW EXECUTE FUNCTION _timescaledb_functions.insert_blocker()
Child tables: _timescaledb_internal._hyper_4_4_chunk
Access method: heap

dbeck added a commit to dbeck/timescaledb that referenced this issue Mar 6, 2025
This fixes bug timescale#7714 where adding a column with a default
value (jargon: missing value) and a compressed batch with
all nulls created an ambiguity. In the all null cases the
compressed block was stored as a NULL value.

With this change, I introduce a new special compression
type, the 'NULL' compression which is a single byte place
holder for an 'all-null' compressed block. This allows us
to distinguish between the missing value vs the all-null
values.

Please note that the wrong results impacted existing tests
so I updated the expected results, as well as I added
reference queries before compression to prove the updated
values were wrong before.

A new debug only GUC was added for testing a future upgrade
script, which will arrive as a separate PR.

A utility function that allows us to check if a value is
missing from a tuple is added, also for supporting the
future PR's upgrade script.
dbeck added a commit to dbeck/timescaledb that referenced this issue Mar 6, 2025
This fixes bug timescale#7714 where adding a column with a default
value (jargon: missing value) and a compressed batch with
all nulls created an ambiguity. In the all null cases the
compressed block was stored as a NULL value.

With this change, I introduce a new special compression
type, the 'NULL' compression which is a single byte place
holder for an 'all-null' compressed block. This allows us
to distinguish between the missing value vs the all-null
values.

Please note that the wrong results impacted existing tests
so I updated the expected results, as well as I added
reference queries before compression to prove the updated
values were wrong before.

A new debug only GUC was added for testing a future upgrade
script, which will arrive as a separate PR.

A utility function that allows us to check if a value is
missing from a tuple is added, also for supporting the
future PR's upgrade script.
dbeck added a commit to dbeck/timescaledb that referenced this issue Mar 6, 2025
This fixes bug timescale#7714 where adding a column with a default
value (jargon: missing value) and a compressed batch with
all nulls created an ambiguity. In the all null cases the
compressed block was stored as a NULL value.

With this change, I introduce a new special compression
type, the 'NULL' compression which is a single byte place
holder for an 'all-null' compressed block. This allows us
to distinguish between the missing value vs the all-null
values.

Please note that the wrong results impacted existing tests
so I updated the expected results, as well as I added
reference queries before compression to prove the updated
values were wrong before.

A new debug only GUC was added for testing a future upgrade
script, which will arrive as a separate PR.
dbeck added a commit to dbeck/timescaledb that referenced this issue Mar 6, 2025
This fixes bug timescale#7714 where adding a column with a default
value (jargon: missing value) and a compressed batch with
all nulls created an ambiguity. In the all null cases the
compressed block was stored as a NULL value.

With this change, I introduce a new special compression
type, the 'NULL' compression which is a single byte place
holder for an 'all-null' compressed block. This allows us
to distinguish between the missing value vs the all-null
values.

Please note that the wrong results impacted existing tests
so I updated the expected results, as well as I added
reference queries before compression to prove the updated
values were wrong before.

A new debug only GUC was added for testing a future upgrade
script, which will arrive as a separate PR.
dbeck added a commit to dbeck/timescaledb that referenced this issue Mar 6, 2025
This fixes bug timescale#7714 where adding a column with a default
value (jargon: missing value) and a compressed batch with
all nulls created an ambiguity. In the all null cases the
compressed block was stored as a NULL value.

With this change, I introduce a new special compression
type, the 'NULL' compression which is a single byte place
holder for an 'all-null' compressed block. This allows us
to distinguish between the missing value vs the all-null
values.

Please note that the wrong results impacted existing tests
so I updated the expected results, as well as I added
reference queries before compression to prove the updated
values were wrong before.

A new debug only GUC was added for testing a future upgrade
script, which will arrive as a separate PR.

Fixes: timescale#7714
dbeck added a commit to dbeck/timescaledb that referenced this issue Mar 6, 2025
This fixes bug timescale#7714 where adding a column with a default
value (jargon: missing value) and a compressed batch with
all nulls created an ambiguity. In the all null cases the
compressed block was stored as a NULL value.

With this change, I introduce a new special compression
type, the 'NULL' compression which is a single byte place
holder for an 'all-null' compressed block. This allows us
to distinguish between the missing value vs the all-null
values.

Please note that the wrong results impacted existing tests
so I updated the expected results, as well as I added
reference queries before compression to prove the updated
values were wrong before.

A new debug only GUC was added for testing a future upgrade
script, which will arrive as a separate PR.

Fixes timescale#7714
dbeck added a commit to dbeck/timescaledb that referenced this issue Mar 7, 2025
This fixes bug timescale#7714 where adding a column with a default
value (jargon: missing value) and a compressed batch with
all nulls created an ambiguity. In the all null cases the
compressed block was stored as a NULL value.

With this change, I introduce a new special compression
type, the 'NULL' compression which is a single byte place
holder for an 'all-null' compressed block. This allows us
to distinguish between the missing value vs the all-null
values.

Please note that the wrong results impacted existing tests
so I updated the expected results, as well as I added
reference queries before compression to prove the updated
values were wrong before.

A new debug only GUC was added for testing a future upgrade
script, which will arrive as a separate PR.

Fixes timescale#7714
dbeck added a commit to dbeck/timescaledb that referenced this issue Mar 7, 2025
This fixes bug timescale#7714 where adding a column with a default
value (jargon: missing value) and a compressed batch with
all nulls created an ambiguity. In the all null cases the
compressed block was stored as a NULL value.

With this change, I introduce a new special compression
type, the 'NULL' compression which is a single byte place
holder for an 'all-null' compressed block. This allows us
to distinguish between the missing value vs the all-null
values.

Please note that the wrong results impacted existing tests
so I updated the expected results, as well as I added
reference queries before compression to prove the updated
values were wrong before.

A new debug only GUC was added for testing a future upgrade
script, which will arrive as a separate PR.

Fixes timescale#7714
dbeck added a commit to dbeck/timescaledb that referenced this issue Mar 7, 2025
This fixes bug timescale#7714 where adding a column with a default
value (jargon: missing value) and a compressed batch with
all nulls created an ambiguity. In the all null cases the
compressed block was stored as a NULL value.

With this change, I introduce a new special compression
type, the 'NULL' compression which is a single byte place
holder for an 'all-null' compressed block. This allows us
to distinguish between the missing value vs the all-null
values.

Please note that the wrong results impacted existing tests
so I updated the expected results, as well as I added
reference queries before compression to prove the updated
values were wrong before.

A new debug only GUC was added for testing a future upgrade
script, which will arrive as a separate PR.

Fixes timescale#7714
dbeck added a commit to dbeck/timescaledb that referenced this issue Mar 8, 2025
This fixes bug timescale#7714 where adding a column with a default
value (jargon: missing value) and a compressed batch with
all nulls created an ambiguity. In the all null cases the
compressed block was stored as a NULL value.

With this change, I introduce a new special compression
type, the 'NULL' compression which is a single byte place
holder for an 'all-null' compressed block. This allows us
to distinguish between the missing value vs the all-null
values.

Please note that the wrong results impacted existing tests
so I updated the expected results, as well as I added
reference queries before compression to prove the updated
values were wrong before.

A new debug only GUC was added for testing a future upgrade
script, which will arrive as a separate PR.

Fixes timescale#7714
dbeck added a commit to dbeck/timescaledb that referenced this issue Mar 11, 2025
This fixes bug timescale#7714 where adding a column with a default
value (jargon: missing value) and a compressed batch with
all nulls created an ambiguity. In the all null cases the
compressed block was stored as a NULL value.

With this change, I introduce a new special compression
type, the 'NULL' compression which is a single byte place
holder for an 'all-null' compressed block. This allows us
to distinguish between the missing value vs the all-null
values.

Please note that the wrong results impacted existing tests
so I updated the expected results, as well as I added
reference queries before compression to prove the updated
values were wrong before.

A new debug only GUC was added for testing a future upgrade
script, which will arrive as a separate PR.

Fixes timescale#7714
dbeck added a commit to dbeck/timescaledb that referenced this issue Mar 11, 2025
This fixes bug timescale#7714 where adding a column with a default
value (jargon: missing value) and a compressed batch with
all nulls created an ambiguity. In the all null cases the
compressed block was stored as a NULL value.

With this change, I introduce a new special compression
type, the 'NULL' compression which is a single byte place
holder for an 'all-null' compressed block. This allows us
to distinguish between the missing value vs the all-null
values.

Please note that the wrong results impacted existing tests
so I updated the expected results, as well as I added
reference queries before compression to prove the updated
values were wrong before.

A new debug only GUC was added for testing a future upgrade
script, which will arrive as a separate PR.

Fixes timescale#7714
@dbeck dbeck closed this as completed in 9c99326 Mar 11, 2025
philkra added a commit that referenced this issue Mar 12, 2025
## 2.19.0 (2025-03-12)

This release contains performance improvements and bug fixes since 
the 2.18.2 release. We recommend that you upgrade at the next 
available opportunity.

**Features**
* [#7586](#7586) Vectorized aggregation with grouping by a single text column.
* [#7632](#7632) Optimize recompression for chunks without segmentby
* [#7655](#7655) Support vectorized aggregation on Hypercore TAM
* [#7669](#7669) Add support for merging compressed chunks
* [#7701](#7701) Implement a custom compression algorithm for bool columns. It is experimental and can undergo backwards-incompatible changes. For testing, enable it using timescaledb.enable_bool_compression = on.
* [#7707](#7707) Support ALTER COLUMN SET NOT NULL on compressed chunks
* [#7765](#7765) Allow tsdb as alias for timescaledb in WITH and SET clauses
* [#7786](#7786) Show warning for inefficient compress_chunk_time_interval configuration
* [#7788](#7788) Add callback to mem_guard for background workers
* [#7789](#7789) Do not recompress segmentwise when default order by is empty
* [#7790](#7790) Add configurable Incremental CAgg Refresh Policy

**Bugfixes**
* [#7665](#7665) Block merging of frozen chunks
* [#7673](#7673) Don't abort additional INSERTs when hitting first conflict
* [#7714](#7714) Fixes a wrong result when compressed NULL values were confused with default values. This happened in very special circumstances with alter table added a new column with a default value, an update and compression in a very particular order.
* [#7747](#7747) Block TAM rewrites with incompatible GUC setting
* [#7748](#7748) Crash in the segmentwise recompression
* [#7764](#7764) Fix compression settings handling in Hypercore TAM
* [#7768](#7768) Remove costing index scan of hypertable parent
* [#7799](#7799) Handle DEFAULT table access name in ALTER TABLE

**Thanks**
* @bjornuppeke for reporting a problem with INSERT INTO ... ON CONFLICT DO NOTHING on compressed chunks
* @kav23alex for reporting a segmentation fault on ALTER TABLE with DEFAULT

Signed-off-by: Philip Krauss <[email protected]>
This was referenced Mar 12, 2025
philkra added a commit that referenced this issue Mar 18, 2025
## 2.19.0 (2025-03-18)

This release contains performance improvements and bug fixes since the
2.18.2 release. We recommend that you upgrade at the next available
opportunity.

* Improved concurrency of INSERT, UPDATE and DELETE operations on the
columnstore by no longer blocking DML statements during the
recompression of a chunk.
* Improved system performance during Continuous Aggregates refreshes by
breaking them into smaller batches which reduces systems pressure and
minimizes the risk of spilling to disk.
* Faster and more up-to-date results for queries against Continuous
Aggregates by materializing the most recent data first (vs old data
first in prior versions).
* Faster analytical queries with SIMD vectorization of aggregations over
text columns and group by over multiple column
* Enable optimizing chunk size for faster query performance on the
columnstore by adding support for merging columnstore chunks to the
merge_chunk API.

**Deprecation warning**

This is the last minor release supporting PostgreSQL 14. Starting with
the minor version of TimescaleDB only Postgres 15, 16 and 17 will be
supported.

**Downgrading of 2.19.0**

This release introduces custom bool compression, if you enable this
feature via the `enable_bool_compression` and must downgrade to a
previous, please use the [following
script](https://github.com/timescale/timescaledb-extras/blob/master/utils/2.19.0-downgrade_new_compression_algorithms.sql)
to convert the columns back to their previous state. TimescaleDB
versions prior to 2.19.0 do not know how to handle this new type.

**Features**
* [#7586](#7586) Vectorized
aggregation with grouping by a single text column.
* [#7632](#7632) Optimize
recompression for chunks without segmentby
* [#7655](#7655) Support
vectorized aggregation on Hypercore TAM
* [#7669](#7669) Add
support for merging compressed chunks
* [#7701](#7701) Implement
a custom compression algorithm for bool columns. It is experimental and
can undergo backwards-incompatible changes. For testing, enable it using
timescaledb.enable_bool_compression = on.
* [#7707](#7707) Support
ALTER COLUMN SET NOT NULL on compressed chunks
* [#7765](#7765) Allow tsdb
as alias for timescaledb in WITH and SET clauses
* [#7786](#7786) Show
warning for inefficient compress_chunk_time_interval configuration
* [#7788](#7788) Add
callback to mem_guard for background workers
* [#7789](#7789) Do not
recompress segmentwise when default order by is empty
* [#7790](#7790) Add
configurable Incremental CAgg Refresh Policy

**Bugfixes**
* [#7665](#7665) Block
merging of frozen chunks
* [#7673](#7673) Don't
abort additional INSERTs when hitting first conflict
* [#7714](#7714) Fixes a
wrong result when compressed NULL values were confused with default
values. This happened in very special circumstances with alter table
added a new column with a default value, an update and compression in a
very particular order.
* [#7747](#7747) Block TAM
rewrites with incompatible GUC setting
* [#7748](#7748) Crash in
the segmentwise recompression
* [#7764](#7764) Fix
compression settings handling in Hypercore TAM
* [#7768](#7768) Remove
costing index scan of hypertable parent
* [#7799](#7799) Handle
DEFAULT table access name in ALTER TABLE

**GUCs**
* `enable_bool_compression`: enable the BOOL compression algorithm,
default: `OFF`
* `enable_exclusive_locking_recompression`: enable exclusive locking
during recompression (legacy mode), default: `OFF`

**Thanks**
* @bjornuppeke for reporting a problem with INSERT INTO ... ON CONFLICT
DO NOTHING on compressed chunks
* @kav23alex for reporting a segmentation fault on ALTER TABLE with
DEFAULT

---------

Signed-off-by: Philip Krauss <[email protected]>
Signed-off-by: Ramon Guiu <[email protected]>
Co-authored-by: Ramon Guiu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant