You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
In our current use case, we need to always recreate the table when running vectorization and inserting new documents into PgvectorDocumentStore. Due to this, during the time window when this operation is ongoing, any RAG pipelines that depend on the impacted document store are "offline". We need to either turn off the impacted pipelines or be okay with partial data.
Describe the solution you'd like
What would be cool is if there was an option to "swap" document stores. I am imagining a process like this: When we need to run vectorization, we create a new document store as a "temp" store. We insert all new documents in this "temp store". When ready, we ask haystack to switch the real store with this temp store. And then delete the temp store. Behind the scenes, it's essentially doing PG table renames in an atomic way.
Note sure what the API would look like to be honest!
Describe alternatives you've considered
Given the current implementation of PgvectorDocumenstore, didn't find any way to swap two stores.
Additional context
An alternative we could try would be to just insert documents in the store with the correct overwrite policy. The challenge is that in our setup, we don't have stable ids for the documents. So we can't reliably de-dup new inserts into the store.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
In our current use case, we need to always recreate the table when running vectorization and inserting new documents into PgvectorDocumentStore. Due to this, during the time window when this operation is ongoing, any RAG pipelines that depend on the impacted document store are "offline". We need to either turn off the impacted pipelines or be okay with partial data.
Describe the solution you'd like
What would be cool is if there was an option to "swap" document stores. I am imagining a process like this: When we need to run vectorization, we create a new document store as a "temp" store. We insert all new documents in this "temp store". When ready, we ask haystack to switch the real store with this temp store. And then delete the temp store. Behind the scenes, it's essentially doing PG table renames in an atomic way.
Note sure what the API would look like to be honest!
Describe alternatives you've considered
Given the current implementation of PgvectorDocumenstore, didn't find any way to swap two stores.
Additional context
An alternative we could try would be to just insert documents in the store with the correct overwrite policy. The challenge is that in our setup, we don't have stable
id
s for the documents. So we can't reliably de-dup new inserts into the store.The text was updated successfully, but these errors were encountered: