Extending MongoDBAtlasDocumentStore to support custom schema #690
Labels
contributions wanted!
Looking for external contributions
feature request
Ideas to improve an integration
integration:mongodb-atlas
P3
Is your feature request related to a problem? Please describe.
The current implementation of MongoDBAtlasDocumentStore only supports specific MongoDB document schema. Content is expected to be stored in the
content
field, and metadata must be within ameta
subdocument. This schema requirement is enforced by the$project
stage in the aggregation pipeline executed by_embedding_retrieval
function:This tightly couples the Haystack Document representation with the database schema, which can be inconvenient. I have a vector store in MongoDB with an existing schema defined when I was using langchaig. Specifically, I have the document's content stored in a
text
field, and I have some metadata stored in different fields of a MongoDB document (likesource
storing the original document location reference). I would prefer to avoid migrating to a new schema dictated by MongoDBAtlasDocumentStore.Describe the solution you'd like
I propose adding the ability to override the
$project
stage of the aggregation pipeline partially, optionally, while retaining the existing behavior as a default. For example, initializing the MongoDBAtlasDocumentStore could look like this:self.content_field_key
andself.meta_project_mapping
would be then used in the$project
aggregation pipeline stage. What do you think?Describe alternatives you've considered
I extended MongoDBAtlasDocumentStore in my project and made the described change. While this approach works, I was wondering if it would be beneficial to include it in the library.
Additional context
I can submit a PR :)
The text was updated successfully, but these errors were encountered: