Skip to content

Support zero-downtime vectorization on PgvectorDocumentStore #1696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
antrix opened this issue May 3, 2025 · 0 comments
Open

Support zero-downtime vectorization on PgvectorDocumentStore #1696

antrix opened this issue May 3, 2025 · 0 comments
Labels
feature request Ideas to improve an integration

Comments

@antrix
Copy link

antrix commented May 3, 2025

Is your feature request related to a problem? Please describe.

In our current use case, we need to always recreate the table when running vectorization and inserting new documents into PgvectorDocumentStore. Due to this, during the time window when this operation is ongoing, any RAG pipelines that depend on the impacted document store are "offline". We need to either turn off the impacted pipelines or be okay with partial data.

Describe the solution you'd like

What would be cool is if there was an option to "swap" document stores. I am imagining a process like this: When we need to run vectorization, we create a new document store as a "temp" store. We insert all new documents in this "temp store". When ready, we ask haystack to switch the real store with this temp store. And then delete the temp store. Behind the scenes, it's essentially doing PG table renames in an atomic way.

Note sure what the API would look like to be honest!

Describe alternatives you've considered
Given the current implementation of PgvectorDocumenstore, didn't find any way to swap two stores.

Additional context
An alternative we could try would be to just insert documents in the store with the correct overwrite policy. The challenge is that in our setup, we don't have stable ids for the documents. So we can't reliably de-dup new inserts into the store.

@antrix antrix added the feature request Ideas to improve an integration label May 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Ideas to improve an integration
Projects
None yet
Development

No branches or pull requests

1 participant