Skip to content

Support Sparse Embedding Retrieval #7355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
anakin87 opened this issue Mar 13, 2024 · 3 comments
Closed

Support Sparse Embedding Retrieval #7355

anakin87 opened this issue Mar 13, 2024 · 3 comments
Assignees
Labels
type:feature New feature or request

Comments

@anakin87
Copy link
Member

anakin87 commented Mar 13, 2024

It is a feature the community asks for and is currently supported by Qdrant and Pinecone.

Update
I experimented with the complete round trip: from Document to sparse embedding stored in Qdrant/Pinecone and then querying (notebook).

What we need to do:

- [x] Investigate/design the integration
- [x] Introduce SparseEmbedding class and add it to Document
- [ ] https://github.com/deepset-ai/haystack-core-integrations/issues/604
- [x] release the SparseEmbedding class in 2.0.1
- [x] Introduce a first Sparse Embedder (https://github.com/deepset-ai/haystack-core-integrations/pull/579)
- [x] Make Qdrant write sparse embeddings (https://github.com/deepset-ai/haystack-core-integrations/pull/578)
- [x] Introduce Qdrant Sparse Embedding Retriever (https://github.com/deepset-ai/haystack-core-integrations/pull/578)
- [x] non-urgent: understand the problems related to Qdrant Hybrid Retriever
- [ ] https://github.com/deepset-ai/haystack-core-integrations/issues/695
- [ ] https://github.com/deepset-ai/haystack-core-integrations/issues/660
- [ ] https://github.com/deepset-ai/haystack-core-integrations/pull/675
- [x] The feature was announced through social media
@anakin87 anakin87 self-assigned this Mar 13, 2024
@lambda-science
Copy link
Contributor

Note:
As a 1st step: we now have working Sparse embedder in Haystack through FastEmbed integration
deepset-ai/haystack-core-integrations#579

@lambda-science
Copy link
Contributor

Btw it would be cool to have a general BM25 Embedder in core haystack repo instead of relying only on Splade Embedder from FastEmbed :) As you already have a haystack-bm25

@lambda-science
Copy link
Contributor

Note:
As a 2nd step: Qdrant integration could now support Sparse vector and be compatible with the FastEmbed sparse embedder from above 👀 deepset-ai/haystack-core-integrations#578

@anakin87 anakin87 added the type:feature New feature or request label Mar 19, 2024
@anakin87 anakin87 changed the title Design the support for Sparse Embedding Retrieval Support for Sparse Embedding Retrieval Mar 22, 2024
@anakin87 anakin87 changed the title Support for Sparse Embedding Retrieval Support Sparse Embedding Retrieval Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants