Skip to content

Extending MongoDBAtlasDocumentStore to support custom schema #690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
verkhovin opened this issue Apr 24, 2024 · 1 comment
Open

Extending MongoDBAtlasDocumentStore to support custom schema #690

verkhovin opened this issue Apr 24, 2024 · 1 comment
Labels
contributions wanted! Looking for external contributions feature request Ideas to improve an integration integration:mongodb-atlas P3

Comments

@verkhovin
Copy link

Is your feature request related to a problem? Please describe.
The current implementation of MongoDBAtlasDocumentStore only supports specific MongoDB document schema. Content is expected to be stored in the content field, and metadata must be within a meta subdocument. This schema requirement is enforced by the $project stage in the aggregation pipeline executed by _embedding_retrieval function:

           {
                "$vectorSearch": {
                    "index": self.vector_search_index,
                    "path": "embedding",
                    "queryVector": query_embedding,
                    "numCandidates": 100,
                    "limit": top_k,
                    "filter": filters,
                }
            },
            {
                "$project": {
                    "_id": 0,
                    "content": 1,
                    "dataframe": 1,
                    "blob": 1,
                    "meta": 1,
                    "embedding": 1,
                    "score": {"$meta": "vectorSearchScore"},
                }
            }

This tightly couples the Haystack Document representation with the database schema, which can be inconvenient. I have a vector store in MongoDB with an existing schema defined when I was using langchaig. Specifically, I have the document's content stored in a text field, and I have some metadata stored in different fields of a MongoDB document (like source storing the original document location reference). I would prefer to avoid migrating to a new schema dictated by MongoDBAtlasDocumentStore.

Describe the solution you'd like
I propose adding the ability to override the $project stage of the aggregation pipeline partially, optionally, while retaining the existing behavior as a default. For example, initializing the MongoDBAtlasDocumentStore could look like this:

MongoDBAtlasDocumentStore(
    database_name="db",
    collection_name="embedded_docs",
    vector_search_index='index',
    content_field_key='text', # maps "text" field in MongoDB to Document's content
    meta_project_mapping={
       {"source": "$source"}  # allows to flexibly build meta from a MongoDB doc fields
    }

self.content_field_key and self.meta_project_mapping would be then used in the $project aggregation pipeline stage. What do you think?

Describe alternatives you've considered
I extended MongoDBAtlasDocumentStore in my project and made the described change. While this approach works, I was wondering if it would be beneficial to include it in the library.

Additional context
I can submit a PR :)

@MetroCat69
Copy link

Hi can I have this issue? can you give me links to simular prs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributions wanted! Looking for external contributions feature request Ideas to improve an integration integration:mongodb-atlas P3
Projects
Development

No branches or pull requests

4 participants