Skip to content

[Bug]: Collation error on duplicate insert into a compressed chunk with orderby containing text column #7342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hackbnw opened this issue Oct 14, 2024 · 2 comments · Fixed by #7345
Labels

Comments

@hackbnw
Copy link

hackbnw commented Oct 14, 2024

What type of bug is this?

Unexpected error

What subsystems and features are affected?

Compression, Query executor

What happened?

Hello,

we encountered the following behavior, which results in issues when inserting older data that is partially duplicate. Do you know what might be the reason?

Given:

  • Hypertable has a composite unique index, the index includes a text column with default collation
  • There is a compression policy on the table, it is segmented by some numeric device_id, the compression order includes the remaining two time and text columns from the unique index (this is also default setting in this scenario when omitting the timescaledb.compress_* options)
  • There is some data present in compressed chunk

When:

  • Inserting 2 or more rows into the table with keys that already exist in the compressed chunk when providing the list of columns to insert that is not identical to the complete list of table columns (reordering or omitting other columns with default values).

Expected:

  • The query returns "INSERT 0 0" and nothing happens (behavior observed in 2.15.3)

Actual behavior:

  • Error "could not determine which collation to use for string comparison" (observed in 2.16.0, 2.16.1 and 2.17.0)

When this does not happen:

  • Inserting only one row instead of two
  • Listing all table columns and values in exactly the same order as they appear in the table definition in the INSERT statement
  • When existing data is stored in an uncompressed chunk
  • Changing compression settings to segment by text column and order by the numeric fields only instead

What did not work:

  • Setting explicit collation on the column, index and inserted values

TimescaleDB version affected

2.16.0 - 2.17.0

PostgreSQL version used

15.8 - 16.3

What operating system did you use?

Ubuntu 22.04

What installation method did you use?

Docker, Other

What platform did you run on?

On prem/Self-hosted, Timescale Cloud

Relevant log output and stack trace

$ docker run -d   --rm   --name timescaledb   -p 5432:5432   -e POSTGRES_PASSWORD=postgres timescale/timescaledb-ha:pg16.3-ts2.16.0
ed3e63ed1f59d21c48e33bfd5b779b12f0aa5e8ae0790d9747feafb24654342c
$ docker exec -it timescaledb psql -d "postgres://postgres:postgres@localhost/postgres"
psql (16.3 (Ubuntu 16.3-1.pgdg22.04+1))
Type "help" for help.

postgres=# CREATE TABLE test_collation (
        time int8 NOT NULL,
        device_id int4 NOT NULL,
        name TEXT NOT NULL,
        CONSTRAINT test_collation_pkey PRIMARY KEY (time, device_id, name)
);
CREATE TABLE
postgres=# SELECT create_hypertable('test_collation', 'time', chunk_time_interval => 2419200000);
      create_hypertable      
-----------------------------
 (1,public,test_collation,t)
(1 row)

postgres=# ALTER TABLE test_collation
SET (
        timescaledb.compress,
        timescaledb.compress_segmentby = 'device_id',
        timescaledb.compress_orderby = 'time DESC, name'
);
ALTER TABLE
postgres=# INSERT INTO "test_collation"
  ("time", "device_id", "name")
VALUES
  (1609477200000, 41, 'val1'),
  (1609478100000, 41, 'val1')
ON CONFLICT DO NOTHING;
INSERT 0 2
postgres=# SELECT compress_chunk(ch) FROM show_chunks('test_collation') ch;
             compress_chunk             
----------------------------------------
 _timescaledb_internal._hyper_1_1_chunk
(1 row)

postgres=# INSERT INTO "test_collation"
  ("device_id", "time", "name")
VALUES
  (41, 1609477200000, 'val1'),
  (41, 1609478100000, 'val1')
ON CONFLICT DO NOTHING;
ERROR:  could not determine which collation to use for string comparison
HINT:  Use the COLLATE clause to set the collation explicitly.

How can we reproduce the bug?

# Start up the Docker container with either 2.15.3 (baseline, working)
docker run -d \
  --rm \
  --name timescaledb \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=postgres timescale/timescaledb-ha:pg16.3-ts2.15.3

# or 2.16.0 (to reproduce the issue)
docker run -d \
  --rm \
  --name timescaledb \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=postgres timescale/timescaledb-ha:pg16.3-ts2.16.0

# Connect to database
docker exec -it timescaledb psql -d "postgres://postgres:postgres@localhost/postgres"

# Paste the following and press Enter
DROP TABLE IF EXISTS test_collation;

CREATE TABLE test_collation (
	time int8 NOT NULL,
	device_id int4 NOT NULL,
	name TEXT NOT NULL,
	CONSTRAINT test_collation_pkey PRIMARY KEY (time, device_id, name)
);

SELECT create_hypertable('test_collation', 'time', chunk_time_interval => 2419200000);
ALTER TABLE test_collation
SET (
	timescaledb.compress,
	timescaledb.compress_segmentby = 'device_id',
	timescaledb.compress_orderby = 'time DESC, name'
);

INSERT INTO "test_collation"
  ("time", "device_id", "name")
VALUES
  (1609477200000, 41, 'val1'),
  (1609478100000, 41, 'val1')
ON CONFLICT DO NOTHING;

SELECT compress_chunk(ch) FROM show_chunks('test_collation') ch;

-- This fails in 2.16.0
INSERT INTO "test_collation"
  ("device_id", "time", "name")
VALUES
  (41, 1609477200000, 'val1'),
  (41, 1609478100000, 'val1')
ON CONFLICT DO NOTHING;
@hackbnw hackbnw added the bug label Oct 14, 2024
@antekresic
Copy link
Member

Seems like a bug in our tuple filtering optimization that came out in 2.16.

I'll create a bugfix for 2.17.1 but in the mean time, the workaround would be to disable tuple filtering:
set timescaledb.enable_dml_decompression_tuple_filtering to off;

@hackbnw
Copy link
Author

hackbnw commented Oct 16, 2024

Thanks a lot, @antekresic !

fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Oct 21, 2024
This release contains performance improvements and bug fixes since
the 2.17.0 release. We recommend that you upgrade at the next
available opportunity.

**Features**
* timescale#7360 Add chunk skipping GUC

**Bugfixes**
* timescale#7335 Change log level used in compression
* timescale#7342 Fix collation for in-memory tuple filtering

**Thanks**
* @gmilamjr for reporting an issue with the log level of compression messages
* @hackbnw for reporting an issue with collation during tuple filtering
fabriziomello added a commit that referenced this issue Oct 21, 2024
This release contains performance improvements and bug fixes since
the 2.17.0 release. We recommend that you upgrade at the next
available opportunity.

**Features**
* #7360 Add chunk skipping GUC

**Bugfixes**
* #7335 Change log level used in compression
* #7342 Fix collation for in-memory tuple filtering

**Thanks**
* @gmilamjr for reporting an issue with the log level of compression messages
* @hackbnw for reporting an issue with collation during tuple filtering
fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Oct 21, 2024
the 2.17.0 release. We recommend that you upgrade at the next
available opportunity.

**Features**
* timescale#7360 Add chunk skipping GUC

**Bugfixes**
* timescale#7335 Change log level used in compression
* timescale#7342 Fix collation for in-memory tuple filtering

**Thanks**
* @gmilamjr for reporting an issue with the log level of compression messages
* @hackbnw for reporting an issue with collation during tuple filtering
fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Oct 21, 2024
This release contains performance improvements and bug fixes since
the 2.17.0 release. We recommend that you upgrade at the next
available opportunity.

**Features**
* timescale#7360 Add chunk skipping GUC

**Bugfixes**
* timescale#7335 Change log level used in compression
* timescale#7342 Fix collation for in-memory tuple filtering

**Thanks**
* @gmilamjr for reporting an issue with the log level of compression messages
* @hackbnw for reporting an issue with collation during tuple filtering
fabriziomello added a commit that referenced this issue Oct 21, 2024
This release contains performance improvements and bug fixes since
the 2.17.0 release. We recommend that you upgrade at the next
available opportunity.

**Features**
* #7360 Add chunk skipping GUC

**Bugfixes**
* #7335 Change log level used in compression
* #7342 Fix collation for in-memory tuple filtering

**Thanks**
* @gmilamjr for reporting an issue with the log level of compression messages
* @hackbnw for reporting an issue with collation during tuple filtering
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants