Pgvector hybrid #576

jlonge4 · 2024-03-13T01:01:00Z

I was hoping to implement hybrid search within the pgvector integration and use RRF for merging. This has one piece missing, and that is the user query itself (commented out in line 571. I didn't want to open an issue/enhancement and not at least provide a starting point. Let me know what you think @anakin87 @vblagoje @masci

Inspiration -> https://github.com/pgvector/pgvector-python/blob/master/examples/hybrid_search_rrf.py

CLAassistant · 2024-03-13T01:01:06Z

All committers have signed the CLA.

anakin87 · 2024-03-13T07:54:08Z

Hey, @jlonge4!

Thanks for your idea/initial implementation.

I know this is a missing feature of this Document Store.
I'll investigate and get back to you in a while...

jlonge4 · 2024-03-13T12:09:03Z

@anakin87 thanks a lot let me know if I can do anything further!

anakin87 · 2024-03-21T14:51:17Z

Hey @jlonge4, sorry for the long wait... Tomorrow I will take a proper look!

jlonge4 · 2024-03-21T18:37:04Z

@anakin87 no worries sir! Thank you 🙏🏼

anakin87 · 2024-03-22T10:35:58Z

Before talking about hybrid retrieval, we should introduce keyword retrieval. Then we can combine vector+keyword -> hybrid retrieval.

Keyword Retrieval

TO DO (brainstorming mode 🙂)

add some configurations to the Document Store (language)
create another index on the DB as done here
introduce a method _keyword_retrieval in the Document Store
create a KeywordRetriever that accepts a query and calls the previous method

(unsure about the name)

WDYT?

jlonge4 · 2024-03-22T21:57:51Z

@anakin87 sounds like a great plan, bite sized pieces are better! Made a few updates based on your thoughts.

anakin87 · 2024-03-27T16:45:27Z

Hey!

can you fix the linting error?
can you add some tests for this addition?

jlonge4 · 2024-03-27T21:26:11Z

@anakin87 Definitely, I am gonna do some local testing (might take a couple days) but will get there!

anakin87 · 2024-03-27T23:26:42Z

Take your time...
I will be off for a few days 🙂

anakin87 · 2024-04-03T14:58:43Z

Hey, I see that tests are no longer running for some reason.

I would suggest to focus on Keyword Retrieval and then think about Hybrid Retrieval (maybe in another PR).

Let me know if you need any help or suggestions...

jlonge4 · 2024-04-03T16:28:37Z

@anakin87 you are right, I believe I'll kill this PR and do as you suggested 👍🏼

kanenorman · 2024-10-17T00:21:19Z

Hi @anakin87, sorry to resurface an old issue.

Are there any plans to introduce a PgvectorHybridRetriever, or is the expectation that hybrid retrieval should be implemented using a pipeline, similar to #738? I’ve noticed that some integrations, like QdrantHybridRetriever, already have hybrid retrievers. Does the core team prefer users implement hybrid retrieval through a pipeline instead?

anakin87 · 2024-10-17T06:49:46Z

Hello @kanenorman!

There is no set rule.

Our main goal is to provide users with hybrid retrieval capabilities, if available (using a Pipeline or not).

I would say that implementing a Hybrid Retriever makes sense especially when:

there is a significant difference in query times compared to using a hybrid retrieval Pipeline (due to a single optimized query in the DB + Pipeline overhead)
the community requests this feature

implement hybrid search

4b789ee

jlonge4 requested a review from a team as a code owner March 13, 2024 01:01

jlonge4 requested review from anakin87 and removed request for a team March 13, 2024 01:01

github-actions bot added the integration:pgvector label Mar 13, 2024

github-actions bot added the type:documentation Improvements or additions to documentation label Mar 20, 2024

jlonge4 added 5 commits March 23, 2024 09:58

add language param to docstore / create index on DB func

91ee610

add language param to docstore / create index on DB func

d2334b0

add bones of keyword_retrieval func

5f40d70

add bones of PgvectorKeywordRetriever

e11af85

add bones of PgvectorKeywordRetriever

d64fe0f

anakin87 self-assigned this Mar 25, 2024

jlonge4 added 2 commits March 27, 2024 17:04

fix linting

d60933d

ruff format

e7853dd

query change

7ab34c0

jlonge4 added 5 commits March 30, 2024 15:01

add keyword test

b0143e9

add keyword test

6760063

add keyword test

01cf28a

add keyword test

a2ddcc4

add keyword test

d9d07b2

jlonge4 added 3 commits March 30, 2024 16:44

add keyword test

700e80f

fixes

b4b4493

add language to init

a269fa1

jlonge4 closed this Apr 5, 2024

anakin87 removed their assignment Apr 5, 2024

anakin87 mentioned this pull request May 8, 2024

Pgvector: support for keyword retrieval #724

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pgvector hybrid #576

Pgvector hybrid #576

jlonge4 commented Mar 13, 2024 •

edited

Loading

CLAassistant commented Mar 13, 2024 •

edited

Loading

anakin87 commented Mar 13, 2024

jlonge4 commented Mar 13, 2024

anakin87 commented Mar 21, 2024

jlonge4 commented Mar 21, 2024

anakin87 commented Mar 22, 2024

jlonge4 commented Mar 22, 2024 •

edited

Loading

anakin87 commented Mar 27, 2024

jlonge4 commented Mar 27, 2024

anakin87 commented Mar 27, 2024

anakin87 commented Apr 3, 2024

jlonge4 commented Apr 3, 2024

kanenorman commented Oct 17, 2024 •

edited

Loading

anakin87 commented Oct 17, 2024

Pgvector hybrid #576

Pgvector hybrid #576

Conversation

jlonge4 commented Mar 13, 2024 • edited Loading

CLAassistant commented Mar 13, 2024 • edited Loading

anakin87 commented Mar 13, 2024

jlonge4 commented Mar 13, 2024

anakin87 commented Mar 21, 2024

jlonge4 commented Mar 21, 2024

anakin87 commented Mar 22, 2024

jlonge4 commented Mar 22, 2024 • edited Loading

anakin87 commented Mar 27, 2024

jlonge4 commented Mar 27, 2024

anakin87 commented Mar 27, 2024

anakin87 commented Apr 3, 2024

jlonge4 commented Apr 3, 2024

kanenorman commented Oct 17, 2024 • edited Loading

anakin87 commented Oct 17, 2024

jlonge4 commented Mar 13, 2024 •

edited

Loading

CLAassistant commented Mar 13, 2024 •

edited

Loading

jlonge4 commented Mar 22, 2024 •

edited

Loading

kanenorman commented Oct 17, 2024 •

edited

Loading