Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Fix tests for change in PostgreSQL 14 behavior change. #14310

Merged
merged 3 commits into from
Oct 27, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/14310.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Allow use of postgres and sqllite full-text search operators in search queries.
5 changes: 2 additions & 3 deletions synapse/storage/databases/main/search.py
Original file line number Diff line number Diff line change
Expand Up @@ -824,9 +824,8 @@ def _tokenize_query(query: str) -> TokenList:
in_phrase = False
parts = deque(query.split('"'))
for i, part in enumerate(parts):
# The contents inside double quotes is treated as a phrase, a trailing
# double quote is not implied.
in_phrase = bool(i % 2) and i != (len(parts) - 1)
# The contents inside double quotes is treated as a phrase.
in_phrase = bool(i % 2)
Comment on lines -827 to +828
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check: this means that a trailing quote is now implied?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. 👍


# Pull out the individual words, discarding any non-word characters.
words = deque(re.findall(r"([\w\-]+)", part, re.UNICODE))
Expand Down
16 changes: 12 additions & 4 deletions tests/storage/test_room_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,6 @@ class MessageSearchTest(HomeserverTestCase):
("fox -nope", (True, False)),
("fox -brown", (False, True)),
('"fox" quick', True),
('"fox quick', True),
('"quick brown', True),
('" quick "', True),
('" nope"', False),
Expand Down Expand Up @@ -269,6 +268,15 @@ def prepare(
response = self.helper.send(self.room_id, self.PHRASE, tok=self.access_token)
self.assertIn("event_id", response)

# The behaviour of a missing trailing double quote changed in PostgreSQL 14
# from ignoring the initial double quote to treating it as a phrase.
Comment on lines +271 to +272
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check: we've changed our parser to mirror this behaviour?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so SQLite will match the behavior of PostgreSQL 14. PostgreSQL 11 - 13 will still use whatever their internal function call does. (And PostgreSQL 10 fallsback to plainto_tsquery anyway so let's just ignore talking about that.)

main_store = homeserver.get_datastores().main
found = False
if isinstance(main_store.database_engine, PostgresEngine):
assert main_store.database_engine._version is not None
found = main_store.database_engine._version < 140000
self.COMMON_CASES.append(('"fox quick', (found, True)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check: I think this means that:

  • the first "chunk" of the phrase (now fox without a leading quote) is found on PG < 14 and SQLite, but not on PG >= 14
  • the second chunk (quick) is found in all cases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, what it means is that:

  • fox and quick are both found on PG < 14
  • The phrase "fox quick" is not found on PG >= 14 and SQLite.


def test_tokenize_query(self) -> None:
"""Test the custom logic to tokenize a user's query."""
cases = (
Expand All @@ -280,9 +288,9 @@ def test_tokenize_query(self) -> None:
("fox -brown", ["fox", SearchToken.Not, "brown"]),
("- fox", [SearchToken.Not, "fox"]),
('"fox" quick', [Phrase(["fox"]), SearchToken.And, "quick"]),
# No trailing double quoe.
('"fox quick', ["fox", SearchToken.And, "quick"]),
('"-fox quick', [SearchToken.Not, "fox", SearchToken.And, "quick"]),
# No trailing double quote.
('"fox quick', [Phrase(["fox", "quick"])]),
('"-fox quick', [Phrase(["-fox", "quick"])]),
('" quick "', [Phrase(["quick"])]),
(
'q"uick brow"n',
Expand Down