feat: add Azure Search as VectorStore provider #2396

MatheusBordin · 2023-08-24T14:35:19Z

Add support for Azure Cognitive Search vector store based on python implementation with some improvement's:

Add batch indexing on embeddings call.
Add batch indexing on azure-search call.
Add support for custom attributes (on document metadata).

TODO:

vercel · 2023-08-24T14:35:22Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
langchainjs-docs	✅ Ready (Inspect)	Visit Preview	Aug 24, 2023 3:05pm

glorat · 2023-09-27T09:36:30Z

langchain/src/vectorstores/azuresearch.ts

+        fields: [DEFAULT_FIELD_CONTENT_VECTOR],
+        kNearestNeighborsCount: k,
+      }],
+      filter,


like the other search methods, needs to have a top: k field passed in

When search using vector similarity the parameter to limit the results is kNearestNeighborsCount.

glorat · 2023-09-27T09:41:43Z

langchain/src/vectorstores/azuresearch.ts

+    const indexClient = new SearchIndexClient(endpoint, credential);
+
+    try {
+      await indexClient.getIndex(indexName);


It seems that even getIndex requires an admin key and will fail with a query key. So you pass in a query key, exception gets thrown, after which the createIndex call will also fail

Yeah, the implementation requires an Admin Key, but if you want to use the query key you can also pass the SearchClient instead of indexName, doing that I'll not validate if index exists and all works.

That is the example:

AzureSearchStore.create({ client: new SearchClient(...), search: {...} });

I think is nice to have that in the documentation and some test's to validate that approach. Make sense?

lucasbuges · 2023-09-27T16:42:07Z

Up!

MatheusBordin · 2023-09-27T17:04:48Z

I'm working on unit and integrate tests and until weekend I'll make the push.

izzymsft · 2023-10-07T17:37:08Z

Hi @MatheusBordin thanks for working on this. I see that you are already working on the tests. Please let me know if I can help or support in anyway. Thanks.

dexteresc · 2023-10-17T12:47:11Z

Is this still being worked on? If not I can take over from where @MatheusBordin started.

izzymsft · 2023-10-17T15:53:08Z

Is this still being worked on? If not I can take over from where @MatheusBordin started.

@dexteresc I had the same question last week. Let's give Matheus a few days to respond (maybe until October 19) and then maybe we can collaborate to take it to the finish line. Quite a lot of folks are looking forward to this feature being added to Langchain JS. It's almost done from what I can see so far.

MatheusBordin · 2023-10-19T00:29:23Z

Hy guys @izzymsft @dexteresc, the last weeks was crazy here, I needed to focus and paused the development of this feature. I'm submitting all changes that I already developed and I think that we can work together to finish all.

izzymsft · 2023-10-19T02:34:44Z

Thank you for sharing the update @MatheusBordin

When you see this please let me know how I can assist. I can write tests, review the code and help with examples.

You can grant me access to your repo as a collaborator or if you would prefer I can fork your repo and send you PRs to your own fork of the original langchainjs repo.

MatheusBordin · 2023-10-19T02:36:42Z

Hey @izzymsft , I'm working right now on the remaining tests (Integration), after that I think we can work on the documentation.

Yeah, I can add you as a collaborator, validate your access on the next 5 minutes please.

izzymsft · 2023-10-19T03:04:58Z

Yes, I have accepted the invite and I was able to make an edit. Thanks.

MatheusBordin@dd2f25f

…langchainjs into feat/azure-search

MatheusBordin · 2023-10-19T03:15:01Z

@izzymsft thanks for your help.

I have pushed a commit with integration tests implemented, unfortunately they are not passing. Its seen to be a problem with my Open AI organization, the embedding api returns timeout. I'll try to fix it tomorrow. Anyway, this implementation is alredy been used for a couple of weeks in production and all works fine, fixing the implementation tests and writing the documentation I think we are ready to release.

izzymsft · 2023-10-19T03:26:48Z

@MatheusBordin no problem

I think for now you can switch to FakeEmbeddings (a mock embedding) so that the tests are not unstable.

You can see how other integration tests in the same directory (Redis/Chroma) are doing it.

I will attempt to run it with my Azure OpenAI settings to see if it is still failing tomorrow

… integration tests

MatheusBordin · 2023-10-19T11:56:04Z

Status update: I pushed the remaining tests just right now. Just missing the documentation to publish the PR.

Thanks @izzymsft for the idea of using FakeEmbeddings on integration tests, I found some issues about the mock implementation but I'll not fix that in this Pull Request because it can late the release.

…to feat/azure-search

MatheusBordin · 2023-10-19T13:02:17Z

Status update: I write the documentation with a simple quick start that I think is enough to publish. For now I'm publishing the feature to review, lets wait...

…ch capabilities

izzymsft · 2023-10-19T13:54:52Z

@MatheusBordin thanks for the update. I have made some changes to the documentation as well to add some links with more information on the vector search capabilities. Let's see if the langchain maintainers have any feedback on the PR.

Thank you for all your efforts on taking this to the finish line.

lucasbuges · 2023-10-19T16:46:35Z

@MatheusBordin @izzymsft @glorat @leofmarciano thanks for the update!!

jacoblee93 · 2023-10-23T23:43:07Z

Ah apologies, didn't realize this was ready! Will have a look now.

jacoblee93 · 2023-10-23T23:44:28Z

langchain/package.json

@@ -887,6 +891,9 @@
    "@aws-sdk/client-sagemaker-runtime": "^3.310.0",
    "@aws-sdk/client-sfn": "^3.310.0",
    "@aws-sdk/credential-provider-node": "^3.388.0",
+    "@aws-sdk/protocol-http": "^3.374.0",


I think these AWS deps should be removed? Maybe a bad merge?

Yeah, looks like a bad merge. I will remove that

jacoblee93 · 2023-10-23T23:50:11Z

langchain/src/vectorstores/azuresearch.ts

+/**
+ * Define metadata schema.
+ *
+ * If yout want to add custom data, use the attributes property.


jacoblee93 · 2023-10-23T23:51:27Z

langchain/src/vectorstores/azuresearch.ts

+ */
+export type AzureSearchDocumentMetadata = {
+  source: string;
+  attributes?: Array<{ key: string; value: string; }>;


I don't love this restriction as many advanced retrievers will rely on being able to set metadata as they please (they won't be aware of the attributes field) - do we think we could just do the mapping from AzureSearchDocumentMetadata.attributes to arbitrary metadata?

Or what is the purpose of source here?

Thinking about how the developer will use that feature, I used the attributes field to prevent developer need to specify the Schema of your Document (like python langchain implementation does) because Cognitive Search has no Dynamic field.

But the pattern used on all others vector stores is not have that. Your suggestion about create a map é good, I will try to implement that.

Or what is the purpose of source here?

In my case the field 'source' is required because is the description about where the document is from. I'll remove this constraint.

jacoblee93 · 2023-10-23T23:54:42Z

langchain/src/vectorstores/azuresearch.ts

+      content_vector: vectors[idx],
+      metadata: {
+        source: doc.metadata?.source,
+        attributes: doc.metadata?.attributes ?? [],


jacoblee93 · 2023-10-23T23:56:40Z

langchain/src/vectorstores/azuresearch.ts

+    const searchType = this.params.search.type;
+    let results: [Document, number][] = [];
+
+    if (searchType === "similarity") {


We've tended to do this at the retriever level:

langchainjs/langchain/src/vectorstores/base.ts

Line 100 in a084aae

if (this.searchType === "mmr") {

But I'm ok with putting it here since a lot of these are quite specific.

jacoblee93

I'm willing to merge this as is but see comments

izzymsft · 2023-10-24T14:40:51Z

I'm willing to merge this as is but see comments

Thanks @jacoblee93 we will work on modifying it as soon as possible. I appreciate the feedback. We will take a look at the other implementations to see how it is done.

jacoblee93 · 2023-10-24T20:46:09Z

Here's an example of one circumstance where a retriever would require arbitrary metadata for docs:

https://js.langchain.com/docs/modules/data_connection/retrievers/how_to/multi-vector-retriever#summary

I would foresee more coming as well as it's quite useful for some advanced retrieval techniques.

deejiw · 2023-11-20T16:03:58Z

Any update so far? This feature would be a big leap for JS flavor.

farzad528 · 2023-11-29T14:01:11Z

Hi team, I wanted to alert everyone that we have a new stable release that I would recommend you all use for this PR. Please note they are a couple breaking changes.

See sample: https://github.com/Azure/azure-search-vector-samples/blob/main/demo-javascript/JavaScriptVectorDemo/code/azure-search-vector-sample.js
See source code: https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/search/search-documents
See REST API docs: https://learn.microsoft.com/en-us/javascript/api/%40azure/search-documents/?view=azure-node-latest
cc: @izzymsft @MatheusBordin

MatheusBordin · 2023-12-13T20:14:26Z

Hi guys, I was working on another stuffs last weeks and can't made any progress here, but I'm working on this right now and maybe in a few days I'll have updates for this implementation.

@farzad528 thank you for your warning, I already made the updates on the code for the new API and all are working fine (I need to push the code yet).

sinedied · 2023-12-18T08:51:00Z

@MatheusBordin thanks for your work on this!

Could I provide some help on this? Whether it's helping with the code, documentation, or tests I can help.
I'm working at Microsoft and can get in touch with the SDK team if needed.

I noticed you're using an old beta of the Document Search SDK, and the 12.0.0 stable version has been released with some breaking changes. It would also be helpful I think to add in the documentation how you can use managed identity instead of keys, as it's part of security best practices.

Don't hesitate to ping me if you'd like some help 🙂

sinedied · 2024-01-16T20:15:42Z

Hi @MatheusBordin, I forked your original branch (keeping your commits and credits) and completed the PR here: #4044

MatheusBordin added 3 commits August 24, 2023 11:12

feat: add vectorestore imp for azure-search

db8b888

chore: update lock

541640d

chore: add azure-search deps to package.json

5161c9c

dosubot bot added the auto:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features label Aug 24, 2023

vercel bot had a problem deploying to Preview August 24, 2023 14:35 Failure

MatheusBordin added 2 commits August 24, 2023 11:38

chore: add azure-search deps

5306fe8

chore: add Azure Search to entrypoint's script

9903983

vercel bot deployed to Preview August 24, 2023 15:05 View deployment

ladrians mentioned this pull request Sep 14, 2023

Azure Cognitive Search #2349

Closed

glorat reviewed Sep 27, 2023

View reviewed changes

leofmarciano approved these changes Sep 27, 2023

View reviewed changes

chore: add unit tests to prevent changes on client behavior

7328983

Added Documentation to makeSearchIndex()

dd2f25f

Matheus Bordin added 2 commits October 19, 2023 00:09

add implementation for some integration tests

db48a0e

Merge branch 'feat/azure-search' of https://github.com/MatheusBordin/…

11625b6

…langchainjs into feat/azure-search

Matheus Bordin added 2 commits October 19, 2023 08:36

refact: use FakeEmbeddings instead of OpenAIEmbeddings in AzureSearch…

cddb686

… integration tests

chore: add remaining tests

aa9272c

Matheus Bordin added 2 commits October 19, 2023 09:45

add documentation

1d4cae3

Merge branch 'main' of https://github.com/langchain-ai/langchainjs in…

b385918

…to feat/azure-search

MatheusBordin marked this pull request as ready for review October 19, 2023 13:00

added additional documentation for Azure Cognitive Search vector sear…

a37cf29

…ch capabilities

jacoblee93 reviewed Oct 23, 2023

View reviewed changes

jacoblee93 self-assigned this Oct 23, 2023

jacoblee93 added lgtm PRs that are ready to be merged as-is question Further information is requested labels Oct 23, 2023

jacoblee93 added the hold On hold label Oct 24, 2023

sinedied mentioned this pull request Jan 16, 2024

community[minor]: Add support for Azure AI Search vector store #4044

Merged

feat: add Azure Search as VectorStore provider #2396

Are you sure you want to change the base?

feat: add Azure Search as VectorStore provider #2396

Conversation

MatheusBordin commented Aug 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel bot commented Aug 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucasbuges commented Sep 27, 2023

Uh oh!

MatheusBordin commented Sep 27, 2023

Uh oh!

izzymsft commented Oct 7, 2023

Uh oh!

dexteresc commented Oct 17, 2023

Uh oh!

izzymsft commented Oct 17, 2023

Uh oh!

MatheusBordin commented Oct 19, 2023

Uh oh!

izzymsft commented Oct 19, 2023

Uh oh!

MatheusBordin commented Oct 19, 2023

Uh oh!

izzymsft commented Oct 19, 2023

Uh oh!

MatheusBordin commented Oct 19, 2023

Uh oh!

izzymsft commented Oct 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MatheusBordin commented Oct 19, 2023

Uh oh!

MatheusBordin commented Oct 19, 2023

Uh oh!

izzymsft commented Oct 19, 2023

Uh oh!

lucasbuges commented Oct 19, 2023

Uh oh!

jacoblee93 commented Oct 23, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacoblee93 Oct 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacoblee93 left a comment

Choose a reason for hiding this comment

Uh oh!

izzymsft commented Oct 24, 2023

Uh oh!

jacoblee93 commented Oct 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deejiw commented Nov 20, 2023

MatheusBordin commented Aug 24, 2023 •

edited

Loading

vercel bot commented Aug 24, 2023 •

edited

Loading

izzymsft commented Oct 19, 2023 •

edited

Loading

jacoblee93 Oct 23, 2023 •

edited

Loading

jacoblee93 commented Oct 24, 2023 •

edited

Loading

MatheusBordin commented Dec 13, 2023 •

edited

Loading