Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Commit 66a5f6c

Browse files
authored
Add a unique index to state_group_edges to prevent duplicates being accidentally introduced and the consequential impact to performance. (#12687)
1 parent f16ec05 commit 66a5f6c

File tree

5 files changed

+139
-0
lines changed

5 files changed

+139
-0
lines changed

changelog.d/12687.bugfix

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Add a unique index to `state_group_edges` to prevent duplicates being accidentally introduced and the consequential impact to performance.

docs/upgrade.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,96 @@ process, for example:
8989
dpkg -i matrix-synapse-py3_1.3.0+stretch1_amd64.deb
9090
```
9191
92+
# Upgrading to v1.60.0
93+
94+
## Adding a new unique index to `state_group_edges` could fail if your database is corrupted
95+
96+
This release of Synapse will add a unique index to the `state_group_edges` table, in order
97+
to prevent accidentally introducing duplicate information (for example, because a database
98+
backup was restored multiple times).
99+
100+
Duplicate rows being present in this table could cause drastic performance problems; see
101+
[issue 11779](https://github.com/matrix-org/synapse/issues/11779) for more details.
102+
103+
If your Synapse database already has had duplicate rows introduced into this table,
104+
this could fail, with either of these errors:
105+
106+
107+
**On Postgres:**
108+
```
109+
synapse.storage.background_updates - 623 - INFO - background_updates-0 - Adding index state_group_edges_unique_idx to state_group_edges
110+
synapse.storage.background_updates - 282 - ERROR - background_updates-0 - Error doing update
111+
...
112+
psycopg2.errors.UniqueViolation: could not create unique index "state_group_edges_unique_idx"
113+
DETAIL: Key (state_group, prev_state_group)=(2, 1) is duplicated.
114+
```
115+
(The numbers may be different.)
116+
117+
**On SQLite:**
118+
```
119+
synapse.storage.background_updates - 623 - INFO - background_updates-0 - Adding index state_group_edges_unique_idx to state_group_edges
120+
synapse.storage.background_updates - 282 - ERROR - background_updates-0 - Error doing update
121+
...
122+
sqlite3.IntegrityError: UNIQUE constraint failed: state_group_edges.state_group, state_group_edges.prev_state_group
123+
```
124+
125+
126+
<details>
127+
<summary><b>Expand this section for steps to resolve this problem</b></summary>
128+
129+
### On Postgres
130+
131+
Connect to your database with `psql`.
132+
133+
```sql
134+
BEGIN;
135+
DELETE FROM state_group_edges WHERE (ctid, state_group, prev_state_group) IN (
136+
SELECT row_id, state_group, prev_state_group
137+
FROM (
138+
SELECT
139+
ctid AS row_id,
140+
MIN(ctid) OVER (PARTITION BY state_group, prev_state_group) AS min_row_id,
141+
state_group,
142+
prev_state_group
143+
FROM state_group_edges
144+
) AS t1
145+
WHERE row_id <> min_row_id
146+
);
147+
COMMIT;
148+
```
149+
150+
151+
### On SQLite
152+
153+
At the command-line, use `sqlite3 path/to/your-homeserver-database.db`:
154+
155+
```sql
156+
BEGIN;
157+
DELETE FROM state_group_edges WHERE (rowid, state_group, prev_state_group) IN (
158+
SELECT row_id, state_group, prev_state_group
159+
FROM (
160+
SELECT
161+
rowid AS row_id,
162+
MIN(rowid) OVER (PARTITION BY state_group, prev_state_group) AS min_row_id,
163+
state_group,
164+
prev_state_group
165+
FROM state_group_edges
166+
)
167+
WHERE row_id <> min_row_id
168+
);
169+
COMMIT;
170+
```
171+
172+
173+
### For more details
174+
175+
[This comment on issue 11779](https://github.com/matrix-org/synapse/issues/11779#issuecomment-1131545970)
176+
has queries that can be used to check a database for this problem in advance.
177+
178+
</details>
179+
180+
181+
92182
# Upgrading to v1.59.0
93183
94184
## Device name lookup over federation has been disabled by default

synapse/storage/background_updates.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -535,6 +535,7 @@ def register_background_index_update(
535535
where_clause: Optional[str] = None,
536536
unique: bool = False,
537537
psql_only: bool = False,
538+
replaces_index: Optional[str] = None,
538539
) -> None:
539540
"""Helper for store classes to do a background index addition
540541
@@ -554,6 +555,8 @@ def register_background_index_update(
554555
unique: true to make a UNIQUE index
555556
psql_only: true to only create this index on psql databases (useful
556557
for virtual sqlite tables)
558+
replaces_index: The name of an index that this index replaces.
559+
The named index will be dropped upon completion of the new index.
557560
"""
558561

559562
def create_index_psql(conn: Connection) -> None:
@@ -585,6 +588,12 @@ def create_index_psql(conn: Connection) -> None:
585588
}
586589
logger.debug("[SQL] %s", sql)
587590
c.execute(sql)
591+
592+
if replaces_index is not None:
593+
# We drop the old index as the new index has now been created.
594+
sql = f"DROP INDEX IF EXISTS {replaces_index}"
595+
logger.debug("[SQL] %s", sql)
596+
c.execute(sql)
588597
finally:
589598
conn.set_session(autocommit=False) # type: ignore
590599

@@ -613,6 +622,12 @@ def create_index_sqlite(conn: Connection) -> None:
613622
logger.debug("[SQL] %s", sql)
614623
c.execute(sql)
615624

625+
if replaces_index is not None:
626+
# We drop the old index as the new index has now been created.
627+
sql = f"DROP INDEX IF EXISTS {replaces_index}"
628+
logger.debug("[SQL] %s", sql)
629+
c.execute(sql)
630+
616631
if isinstance(self.db_pool.engine, engines.PostgresEngine):
617632
runner: Optional[Callable[[Connection], None]] = create_index_psql
618633
elif psql_only:

synapse/storage/databases/state/bg_updates.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,7 @@ class StateBackgroundUpdateStore(StateGroupBackgroundUpdateStore):
195195
STATE_GROUP_DEDUPLICATION_UPDATE_NAME = "state_group_state_deduplication"
196196
STATE_GROUP_INDEX_UPDATE_NAME = "state_group_state_type_index"
197197
STATE_GROUPS_ROOM_INDEX_UPDATE_NAME = "state_groups_room_id_idx"
198+
STATE_GROUP_EDGES_UNIQUE_INDEX_UPDATE_NAME = "state_group_edges_unique_idx"
198199

199200
def __init__(
200201
self,
@@ -217,6 +218,21 @@ def __init__(
217218
columns=["room_id"],
218219
)
219220

221+
# `state_group_edges` can cause severe performance issues if duplicate
222+
# rows are introduced, which can accidentally be done by well-meaning
223+
# server admins when trying to restore a database dump, etc.
224+
# See https://github.com/matrix-org/synapse/issues/11779.
225+
# Introduce a unique index to guard against that.
226+
self.db_pool.updates.register_background_index_update(
227+
self.STATE_GROUP_EDGES_UNIQUE_INDEX_UPDATE_NAME,
228+
index_name="state_group_edges_unique_idx",
229+
table="state_group_edges",
230+
columns=["state_group", "prev_state_group"],
231+
unique=True,
232+
# The old index was on (state_group) and was not unique.
233+
replaces_index="state_group_edges_idx",
234+
)
235+
220236
async def _background_deduplicate_state(
221237
self, progress: dict, batch_size: int
222238
) -> int:
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
/* Copyright 2022 The Matrix.org Foundation C.I.C
2+
*
3+
* Licensed under the Apache License, Version 2.0 (the "License");
4+
* you may not use this file except in compliance with the License.
5+
* You may obtain a copy of the License at
6+
*
7+
* http://www.apache.org/licenses/LICENSE-2.0
8+
*
9+
* Unless required by applicable law or agreed to in writing, software
10+
* distributed under the License is distributed on an "AS IS" BASIS,
11+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
* See the License for the specific language governing permissions and
13+
* limitations under the License.
14+
*/
15+
16+
INSERT INTO background_updates (ordering, update_name, progress_json) VALUES
17+
(7008, 'state_group_edges_unique_idx', '{}');

0 commit comments

Comments
 (0)