-
Notifications
You must be signed in to change notification settings - Fork 3.7k
.Net: Azure AI Search connector lacks key features #10880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@dluc as some point, I don't think it makes sense to ask the abstraction to support each and every feature of a specific store (Azure AI Search) in this case - the point of the abstraction is to cover commonalities, and if users want to use very store-specific features, they can use the native SDK. Having said that, we had discussions around this with @westey-m, and specifically features which are "creation-only" could make more sense via a provider-specific extensibility model in VectorStoreRecordDefinition; we'd basically e.g. add a Where I think we'd stop, is when a provider-specific feature actually requires an API modification, e.g. when searching or inserting. |
@dluc Can you provide more details on the following:
|
Context:
The suggestion at the start, is to allow custom parameters, similarly to how SK supports custom LLM parameters, for example TopK, MinP, MiroStat params for Ollama, params that don't exist for OpenAI. The stricter is the interface and the fewer integrations we will enable, and these 3 seem to be a good compromise:
|
There's several different things above, I'll try to tease them apart and answer each one separately. So first, we are currently working on introducing a filter-only search, which doesn't accept a vector and doesn't do any similarity. This is tracked by #10295, and makes sense as virtually all vector databases have some sort of support for this. While this covers criteria-only filtering, it doesn't cover pure full-text searching that isn't coupled to a vector search (full-text that's coupled to a vector search is already covered by hybrid search).
If we do go in the direction of supporting full-text-only search, I don't think some "custom params" bag is a good answer for this, since the search API method would still require you to pass in a vector property. Wouldn't it make more sense for the Azure AI Search connector to simply expose an additional API that does specifically full text search? This would not be part of the abstraction - at least not yet (since no other database supports this AFAIK) - but you'd be able to simply cast your IVectorStoreRecordCollection down to AzureAISearchVectorStoreRecordCollection, and then call that connector-specific API; it would have the exact signature that makes sense for that operation, rather than an untyped "extra parameters" bag. BTW that's another example of why you can't expect to be able to write the same code to interface with all connectors via the abstraction - actual capabilities simply are very different (in this case, a search type that's only supported on one database).
This would mainly mean removing validation that at least one vector property is defined on the record, and making sure that e.g. the Azure AI Search Upsert APIs work when one is absent. I could see us doing this if it's necessary.
I don't think that's related to the rest of the conversation (best to have separate issues/discussions for separate problems). But in any case, base64 is simply a way to encode binary data, and should be an internal implementation detail of the connector as it sends the embedding to the server. In other words, the user shouldn't be doing base64-encoding of embeddings to hand them off to MEVD - they should be passing the .NET type which actually represents the embedding (e.g. a byte[] or a float[], wrapped by Embedding), and the connector should do whatever encoding/processing is necessary in order to send it to the server. That's also how we allow the user to write code that works against multiple databases. Beyond that, MEVD already supports different types of embeddings (in the same of float[], Single[], byte[]). We may have not implemented support for them in the Azure AI Search connector yet, but at least at the API leve we've taken care not to restrict e.g. only to float[]. |
Missing functionality will make it hard to work with existing indexes (created by other apps/services). Missing scalability options.
Suggestion: extend the connector to cover these features:
Need more clarity
The text was updated successfully, but these errors were encountered: