Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Invalid hypertable_compression_stats output until chunks are recompressed #7713

Open
jflambert opened this issue Feb 14, 2025 · 6 comments · May be fixed by #7901
Open

[Bug]: Invalid hypertable_compression_stats output until chunks are recompressed #7713

jflambert opened this issue Feb 14, 2025 · 6 comments · May be fixed by #7901

Comments

@jflambert
Copy link

jflambert commented Feb 14, 2025

What type of bug is this?

Incorrect result

What subsystems and features are affected?

Compression with hypercore tam.

What happened?

I'm unable to get compression stats unless I forcibly recompress chunks.

TimescaleDB version affected

2.18.1, 2.18.2, 2.19.0

PostgreSQL version used

16.6-16.8

What operating system did you use?

timescaledb-ha:pg16.6-ts2.18.1
timescaledb-ha:pg16.7-ts2.18.2
timescaledb-ha:pg16.8-ts2.19.0

What installation method did you use?

Docker

What platform did you run on?

On prem/Self-hosted

How can we reproduce the bug?

DROP TABLE IF EXISTS test_signals;
CREATE TABLE test_signals
(
    device_id uuid NOT NULL,
    channel text NOT NULL,
    signal text NOT NULL,
    signal_timestamp TIMESTAMPTZ NOT NULL,
    value float NOT NULL,
    PRIMARY KEY (device_id, channel, signal, signal_timestamp)
);
SELECT create_hypertable('test_signals', 'signal_timestamp', CHUNK_TIME_INTERVAL => INTERVAL '1 DAY');

ALTER TABLE test_signals SET (
  timescaledb.enable_columnstore,
  timescaledb.segmentby='device_id',
  timescaledb.orderby='channel, signal, signal_timestamp DESC'
);
ALTER TABLE test_signals SET ACCESS METHOD hypercore;

INSERT INTO test_signals(channel, device_id, signal_timestamp, signal, value)
SELECT 'chan', device_ids, signal_timestamps, signals, random()*100
FROM UNNEST(ARRAY[gen_random_uuid(), gen_random_uuid(), gen_random_uuid()]) AS device_ids,
GENERATE_SERIES('2025-02-01', '2025-02-11', '1 second'::INTERVAL) AS signal_timestamps,
UNNEST(ARRAY['signal1', 'signal_longer_name_2', 'signal_much_much_longer_name_3']::TEXT[]) AS signals;

SELECT pg_size_pretty(before_compression_total_bytes) AS before_compress, 
  pg_size_pretty(after_compression_total_bytes) AS after_compress, 
  pg_size_pretty(hypertable_size('test_signals')) AS hypertable_size
FROM hypertable_compression_stats('test_signals');

initial table size is 1.7GB

 before_compress | after_compress | hypertable_size 
-----------------+----------------+-----------------
 0 bytes         | 176 kB         | 1736 MB
(1 row)

Let's compress.

SELECT compress_chunk(c) FROM show_chunks('test_signals') c;
SELECT pg_size_pretty(before_compression_total_bytes) AS before_compress, 
  pg_size_pretty(after_compression_total_bytes) AS after_compress, 
  pg_size_pretty(hypertable_size('test_signals')) AS hypertable_size
FROM hypertable_compression_stats('test_signals');

Stats don't show up at all. table size is shrinking due to the autovacuum daemon presumably.

 before_compress | after_compress | hypertable_size 
-----------------+----------------+-----------------
 0 bytes         | 176 kB         | 1542 MB
(1 row)

Let's decompress.

SELECT decompress_chunk(c) FROM show_chunks('test_signals') c;
SELECT pg_size_pretty(before_compression_total_bytes) AS before_compress, 
  pg_size_pretty(after_compression_total_bytes) AS after_compress, 
  pg_size_pretty(hypertable_size('test_signals')) AS hypertable_size
FROM hypertable_compression_stats('test_signals');

Table size neither grows nor shrinks at this point.

 before_compress | after_compress | hypertable_size 
-----------------+----------------+-----------------
                 |                | 1292 MB
(1 row)

Let's compress again.

SELECT compress_chunk(c) FROM show_chunks('test_signals') c;
SELECT pg_size_pretty(before_compression_total_bytes) AS before_compress, 
  pg_size_pretty(after_compression_total_bytes) AS after_compress, 
  pg_size_pretty(hypertable_size('test_signals')) AS hypertable_size
FROM hypertable_compression_stats('test_signals');

Finally, expected values from the view (though slightly different from table size)

 before_compress | after_compress | hypertable_size 
-----------------+----------------+-----------------
 1292 MB         | 66 MB          | 67 MB
(1 row)
@jflambert
Copy link
Author

Still a problem in 2.18.2 FYI

@jflambert
Copy link
Author

@antekresic to be clear, the bug happens regardless of access method. I shouldn't have included that in my replication steps I suppose.

@jflambert
Copy link
Author

Anyone has a workaround for this? It's extremely frustrating to have to recompress chunks in order to get valid compression stats. Is there another view I can use?

@erimatnor
Copy link
Contributor

I tried this on 2.18.2, but I couldn't reproduce the issue:

CREATE TABLE
     create_hypertable
---------------------------
 (1,public,test_signals,t)
(1 row)

ALTER TABLE
INSERT 0 7776009
 before_compress | after_compress | hypertable_size
-----------------+----------------+-----------------
                 |                | 1736 MB
(1 row)

postgres=#  SELECT pg_size_pretty(before_compression_total_bytes) AS before_compress,
postgres-#   pg_size_pretty(after_compression_total_bytes) AS after_compress,
postgres-#   pg_size_pretty(hypertable_size('test_signals')) AS hypertable_size
postgres-# FROM hypertable_compression_stats('test_signals');
 before_compress | after_compress | hypertable_size
-----------------+----------------+-----------------
                 |                | 1736 MB
(1 row)

postgres=# SELECT compress_chunk(c) FROM show_chunks('test_signals') c;
             compress_chunk
-----------------------------------------
 _timescaledb_internal._hyper_1_1_chunk
 _timescaledb_internal._hyper_1_2_chunk
 _timescaledb_internal._hyper_1_3_chunk
 _timescaledb_internal._hyper_1_4_chunk
 _timescaledb_internal._hyper_1_5_chunk
 _timescaledb_internal._hyper_1_6_chunk
 _timescaledb_internal._hyper_1_7_chunk
 _timescaledb_internal._hyper_1_8_chunk
 _timescaledb_internal._hyper_1_9_chunk
 _timescaledb_internal._hyper_1_10_chunk
 _timescaledb_internal._hyper_1_11_chunk
(11 rows)

postgres=#  SELECT pg_size_pretty(before_compression_total_bytes) AS before_compress,                                           pg_size_pretty(after_compression_total_bytes) AS after_compress,                                                              pg_size_pretty(hypertable_size('test_signals')) AS hypertable_size                                                          FROM hypertable_compression_stats('test_signals');
 before_compress | after_compress | hypertable_size
-----------------+----------------+-----------------
 1736 MB         | 66 MB          | 67 MB
(1 row)

@jflambert I am wondering what is different with your setup?

@jflambert
Copy link
Author

jflambert commented Mar 27, 2025

@erimatnor I was completely wrong when I said this!

@antekresic to be clear, the bug happens regardless of access method. I shouldn't have included that in my replication steps I suppose.

In fact the bug only happens with hypercore table access method (I've updated my steps).

However, I just tested with timescaledb-ha:pg16.8-ts2.19.0 and the problem still happens. Let me know :) @mkindahl might be interested in this issue too.

@erimatnor
Copy link
Contributor

@jflambert I managed to reproduce the issue. It turns out that this is an issue with how we handle compression stats. They are not updated when a chunk is recompressed using segmentwise recompression. Neither are the stats updated if data is transparently decompressed due to, e.g., updates. As a workaround, you can disable segmentwise recompression with a GUC.

The explanation for what happens when you use the hypercore table access method on the hypertable is as follows: New chunks are created using the table access method and they are technically "compressed", but they have no data. Then you insert data and run recompression (using segmentwise approach) and stats are not updated. The similar thing would happen if you first compress a chunk without using the table access method and then backfill it and recompress. Stats won't be updated so they will no longer show correct size given the backfill. The extreme case is you backfill most of the data, and stats will be really wrong.

We could probably implement a partial solution by updating the stats if we do segmentwise recompression and there is no compressed data. But that still won't solve the backfill issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants