[Bug]: Collation error on duplicate insert into a compressed chunk with orderby containing text column #7342

hackbnw · 2024-10-14T14:48:13Z

What type of bug is this?

Unexpected error

What subsystems and features are affected?

Compression, Query executor

What happened?

Hello,

we encountered the following behavior, which results in issues when inserting older data that is partially duplicate. Do you know what might be the reason?

Given:

Hypertable has a composite unique index, the index includes a text column with default collation
There is a compression policy on the table, it is segmented by some numeric device_id, the compression order includes the remaining two time and text columns from the unique index (this is also default setting in this scenario when omitting the timescaledb.compress_* options)
There is some data present in compressed chunk

When:

Inserting 2 or more rows into the table with keys that already exist in the compressed chunk when providing the list of columns to insert that is not identical to the complete list of table columns (reordering or omitting other columns with default values).

Expected:

The query returns "INSERT 0 0" and nothing happens (behavior observed in 2.15.3)

Actual behavior:

Error "could not determine which collation to use for string comparison" (observed in 2.16.0, 2.16.1 and 2.17.0)

When this does not happen:

Inserting only one row instead of two
Listing all table columns and values in exactly the same order as they appear in the table definition in the INSERT statement
When existing data is stored in an uncompressed chunk
Changing compression settings to segment by text column and order by the numeric fields only instead

What did not work:

Setting explicit collation on the column, index and inserted values

TimescaleDB version affected

2.16.0 - 2.17.0

PostgreSQL version used

15.8 - 16.3

What operating system did you use?

Ubuntu 22.04

What installation method did you use?

Docker, Other

What platform did you run on?

On prem/Self-hosted, Timescale Cloud

Relevant log output and stack trace

$ docker run -d   --rm   --name timescaledb   -p 5432:5432   -e POSTGRES_PASSWORD=postgres timescale/timescaledb-ha:pg16.3-ts2.16.0
ed3e63ed1f59d21c48e33bfd5b779b12f0aa5e8ae0790d9747feafb24654342c
$ docker exec -it timescaledb psql -d "postgres://postgres:postgres@localhost/postgres"
psql (16.3 (Ubuntu 16.3-1.pgdg22.04+1))
Type "help" for help.

postgres=# CREATE TABLE test_collation (
        time int8 NOT NULL,
        device_id int4 NOT NULL,
        name TEXT NOT NULL,
        CONSTRAINT test_collation_pkey PRIMARY KEY (time, device_id, name)
);
CREATE TABLE
postgres=# SELECT create_hypertable('test_collation', 'time', chunk_time_interval => 2419200000);
      create_hypertable      
-----------------------------
 (1,public,test_collation,t)
(1 row)

postgres=# ALTER TABLE test_collation
SET (
        timescaledb.compress,
        timescaledb.compress_segmentby = 'device_id',
        timescaledb.compress_orderby = 'time DESC, name'
);
ALTER TABLE
postgres=# INSERT INTO "test_collation"
  ("time", "device_id", "name")
VALUES
  (1609477200000, 41, 'val1'),
  (1609478100000, 41, 'val1')
ON CONFLICT DO NOTHING;
INSERT 0 2
postgres=# SELECT compress_chunk(ch) FROM show_chunks('test_collation') ch;
             compress_chunk             
----------------------------------------
 _timescaledb_internal._hyper_1_1_chunk
(1 row)

postgres=# INSERT INTO "test_collation"
  ("device_id", "time", "name")
VALUES
  (41, 1609477200000, 'val1'),
  (41, 1609478100000, 'val1')
ON CONFLICT DO NOTHING;
ERROR:  could not determine which collation to use for string comparison
HINT:  Use the COLLATE clause to set the collation explicitly.

How can we reproduce the bug?

# Start up the Docker container with either 2.15.3 (baseline, working)
docker run -d \
  --rm \
  --name timescaledb \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=postgres timescale/timescaledb-ha:pg16.3-ts2.15.3

# or 2.16.0 (to reproduce the issue)
docker run -d \
  --rm \
  --name timescaledb \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=postgres timescale/timescaledb-ha:pg16.3-ts2.16.0

# Connect to database
docker exec -it timescaledb psql -d "postgres://postgres:postgres@localhost/postgres"

# Paste the following and press Enter
DROP TABLE IF EXISTS test_collation;

CREATE TABLE test_collation (
	time int8 NOT NULL,
	device_id int4 NOT NULL,
	name TEXT NOT NULL,
	CONSTRAINT test_collation_pkey PRIMARY KEY (time, device_id, name)
);

SELECT create_hypertable('test_collation', 'time', chunk_time_interval => 2419200000);
ALTER TABLE test_collation
SET (
	timescaledb.compress,
	timescaledb.compress_segmentby = 'device_id',
	timescaledb.compress_orderby = 'time DESC, name'
);

INSERT INTO "test_collation"
  ("time", "device_id", "name")
VALUES
  (1609477200000, 41, 'val1'),
  (1609478100000, 41, 'val1')
ON CONFLICT DO NOTHING;

SELECT compress_chunk(ch) FROM show_chunks('test_collation') ch;

-- This fails in 2.16.0
INSERT INTO "test_collation"
  ("device_id", "time", "name")
VALUES
  (41, 1609477200000, 'val1'),
  (41, 1609478100000, 'val1')
ON CONFLICT DO NOTHING;

antekresic · 2024-10-15T10:09:07Z

Seems like a bug in our tuple filtering optimization that came out in 2.16.

I'll create a bugfix for 2.17.1 but in the mean time, the workaround would be to disable tuple filtering:
set timescaledb.enable_dml_decompression_tuple_filtering to off;

hackbnw · 2024-10-16T07:24:02Z

Thanks a lot, @antekresic !

@gmilamjr

This release contains performance improvements and bug fixes since the 2.17.0 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#7360 Add chunk skipping GUC **Bugfixes** * timescale#7335 Change log level used in compression * timescale#7342 Fix collation for in-memory tuple filtering **Thanks** * @gmilamjr for reporting an issue with the log level of compression messages * @hackbnw for reporting an issue with collation during tuple filtering

@gmilamjr

This release contains performance improvements and bug fixes since the 2.17.0 release. We recommend that you upgrade at the next available opportunity. **Features** * #7360 Add chunk skipping GUC **Bugfixes** * #7335 Change log level used in compression * #7342 Fix collation for in-memory tuple filtering **Thanks** * @gmilamjr for reporting an issue with the log level of compression messages * @hackbnw for reporting an issue with collation during tuple filtering

@gmilamjr

the 2.17.0 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#7360 Add chunk skipping GUC **Bugfixes** * timescale#7335 Change log level used in compression * timescale#7342 Fix collation for in-memory tuple filtering **Thanks** * @gmilamjr for reporting an issue with the log level of compression messages * @hackbnw for reporting an issue with collation during tuple filtering

@gmilamjr

This release contains performance improvements and bug fixes since the 2.17.0 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#7360 Add chunk skipping GUC **Bugfixes** * timescale#7335 Change log level used in compression * timescale#7342 Fix collation for in-memory tuple filtering **Thanks** * @gmilamjr for reporting an issue with the log level of compression messages * @hackbnw for reporting an issue with collation during tuple filtering

@gmilamjr

This release contains performance improvements and bug fixes since the 2.17.0 release. We recommend that you upgrade at the next available opportunity. **Features** * #7360 Add chunk skipping GUC **Bugfixes** * #7335 Change log level used in compression * #7342 Fix collation for in-memory tuple filtering **Thanks** * @gmilamjr for reporting an issue with the log level of compression messages * @hackbnw for reporting an issue with collation during tuple filtering

hackbnw added the bug label Oct 14, 2024

antekresic mentioned this issue Oct 15, 2024

Fix collations for in-memory filtering #7345

Merged

antekresic closed this as completed in #7345 Oct 15, 2024

timescale-automation mentioned this issue Oct 15, 2024

Backport to 2.17.x: #7345: Fix collations for in-memory filtering #7346

Merged

fabriziomello mentioned this issue Oct 21, 2024

Release 2.17.1 #7379

Merged

fabriziomello mentioned this issue Oct 21, 2024

Release 2.17.1 on main #7380

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Collation error on duplicate insert into a compressed chunk with orderby containing text column #7342

[Bug]: Collation error on duplicate insert into a compressed chunk with orderby containing text column #7342

hackbnw commented Oct 14, 2024 •

edited

Loading

antekresic commented Oct 15, 2024

Uh oh!

hackbnw commented Oct 16, 2024

Uh oh!

[Bug]: Collation error on duplicate insert into a compressed chunk with orderby containing text column #7342

[Bug]: Collation error on duplicate insert into a compressed chunk with orderby containing text column #7342

Comments

hackbnw commented Oct 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of bug is this?

What subsystems and features are affected?

What happened?

TimescaleDB version affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

antekresic commented Oct 15, 2024

Uh oh!

hackbnw commented Oct 16, 2024

Uh oh!

hackbnw commented Oct 14, 2024 •

edited

Loading