diff --git a/docs/api_refs/blacklisted-entrypoints.json b/docs/api_refs/blacklisted-entrypoints.json index e1b4fa28e3a6..05088f93bdc6 100644 --- a/docs/api_refs/blacklisted-entrypoints.json +++ b/docs/api_refs/blacklisted-entrypoints.json @@ -58,6 +58,7 @@ "../../langchain/src/vectorstores/faiss.ts", "../../langchain/src/vectorstores/weaviate.ts", "../../langchain/src/vectorstores/lancedb.ts", + "../../langchain/src/vectorstores/mariadb.ts", "../../langchain/src/vectorstores/momento_vector_index.ts", "../../langchain/src/vectorstores/mongodb_atlas.ts", "../../langchain/src/vectorstores/pinecone.ts", diff --git a/docs/core_docs/.gitignore b/docs/core_docs/.gitignore index 9353e462637e..03c560e92290 100644 --- a/docs/core_docs/.gitignore +++ b/docs/core_docs/.gitignore @@ -250,6 +250,8 @@ docs/integrations/vectorstores/mongodb_atlas.md docs/integrations/vectorstores/mongodb_atlas.mdx docs/integrations/vectorstores/memory.md docs/integrations/vectorstores/memory.mdx +docs/integrations/vectorstores/mariadb.md +docs/integrations/vectorstores/mariadb.mdx docs/integrations/vectorstores/hnswlib.md docs/integrations/vectorstores/hnswlib.mdx docs/integrations/vectorstores/faiss.md diff --git a/docs/core_docs/docs/how_to/indexing.mdx b/docs/core_docs/docs/how_to/indexing.mdx index 846b59b504dc..ebb9de356cda 100644 --- a/docs/core_docs/docs/how_to/indexing.mdx +++ b/docs/core_docs/docs/how_to/indexing.mdx @@ -63,7 +63,7 @@ When content is mutated (e.g., the source PDF file was revised) there will be a b). delete by id (delete method with ids argument) Compatible Vectorstores: [`PGVector`](/docs/integrations/vectorstores/pgvector), [`Chroma`](/docs/integrations/vectorstores/chroma), [`CloudflareVectorize`](/docs/integrations/vectorstores/cloudflare_vectorize), -[`ElasticVectorSearch`](/docs/integrations/vectorstores/elasticsearch), [`FAISS`](/docs/integrations/vectorstores/faiss), [`MomentoVectorIndex`](/docs/integrations/vectorstores/momento_vector_index), +[`ElasticVectorSearch`](/docs/integrations/vectorstores/elasticsearch), [`FAISS`](/docs/integrations/vectorstores/faiss), [`MariaDB`](/docs/integrations/vectorstores/mariadb), [`MomentoVectorIndex`](/docs/integrations/vectorstores/momento_vector_index), [`Pinecone`](/docs/integrations/vectorstores/pinecone), [`SupabaseVectorStore`](/docs/integrations/vectorstores/supabase), [`VercelPostgresVectorStore`](/docs/integrations/vectorstores/vercel_postgres), [`Weaviate`](/docs/integrations/vectorstores/weaviate), [`Xata`](/docs/integrations/vectorstores/xata) diff --git a/docs/core_docs/docs/integrations/vectorstores/mariadb.ipynb b/docs/core_docs/docs/integrations/vectorstores/mariadb.ipynb new file mode 100644 index 000000000000..11d21e36a17d --- /dev/null +++ b/docs/core_docs/docs/integrations/vectorstores/mariadb.ipynb @@ -0,0 +1,503 @@ +{ + "cells": [ + { + "cell_type": "raw", + "id": "1957f5cb", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "---\n", + "sidebar_label: MariaDB\n", + "sidebar_class_name: node-only\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "ef1f0986", + "metadata": {}, + "source": [ + "# MariaDB\n", + "\n", + "```{=mdx}\n", + ":::tip Compatibility\n", + "Only available on Node.js.\n", + ":::\n", + "```\n", + "\n", + "This requires MariaDB 11.7 or later version\n", + "\n", + "This guide provides a quick overview for getting started with mariadb [vector stores](/docs/concepts/#vectorstores). For detailed documentation of all `MariaDB store` features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_community_vectorstores_mariadb.MariaDBStore.html)." + ] + }, + { + "cell_type": "markdown", + "id": "c824838d", + "metadata": {}, + "source": [ + "## Overview\n", + "\n", + "### Integration details\n", + "\n", + "| Class | Package | [PY support](https://python.langchain.com/docs/integrations/vectorstores/mariadb/) | Package latest |\n", + "| :--- | :--- | :---: | :---: |\n", + "| [`MariaDBStore`](https://api.js.langchain.com/classes/langchain_community_vectorstores_mariadb.MariaDBStore.html) | [`@langchain/community`](https://npmjs.com/@langchain/community) | ✅ | ![NPM - Version](https://img.shields.io/npm/v/@langchain/community?style=flat-square&label=%20&) |" + ] + }, + { + "cell_type": "markdown", + "id": "36fdc060", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "To use MariaDBVector vector stores, you'll need to set up a MariaDB 11.7 version or later with the [`mariadb`](https://www.npmjs.com/package/mariadb) connector as a peer dependency.\n", + "\n", + "This guide will also use [OpenAI embeddings](/docs/integrations/text_embedding/openai), which require you to install the `@langchain/openai` integration package. You can also use [other supported embeddings models](/docs/integrations/text_embedding) if you wish.\n", + "\n", + "We'll also use the [`uuid`](https://www.npmjs.com/package/uuid) package to generate ids in the required format.\n", + "\n", + "```{=mdx}\n", + "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n", + "import Npm2Yarn from \"@theme/Npm2Yarn\";\n", + "\n", + "\n", + "\n", + "\n", + " @langchain/community @langchain/openai @langchain/core mariadb uuid\n", + "\n", + "```\n", + "\n", + "### Setting up an instance\n", + "\n", + "Create a file with the below content named docker-compose.yml:\n", + "\n", + "```yaml\n", + "# Run this command to start the database:\n", + "# docker-compose up --build\n", + "version: \"3\"\n", + "services:\n", + " db:\n", + " hostname: 127.0.0.1\n", + " image: mariadb/mariadb:11.7-rc\n", + " ports:\n", + " - 3306:3306\n", + " restart: always\n", + " environment:\n", + " - MARIADB_DATABASE=api\n", + " - MARIADB_USER=myuser\n", + " - MARIADB_PASSWORD=ChangeMe\n", + " - MARIADB_ROOT_PASSWORD=ChangeMe\n", + " volumes:\n", + " - ./init.sql:/docker-entrypoint-initdb.d/init.sql\n", + "```\n", + "\n", + "And then in the same directory, run docker compose up to start the container.\n", + "\n", + "### Credentials\n", + "\n", + "To connect to you MariaDB instance, you'll need corresponding credentials. For a full list of supported options, see the [`mariadb` docs](https://github.com/mariadb-corporation/mariadb-connector-nodejs/blob/master/documentation/promise-api.md#connection-options).\n", + "\n", + "If you are using OpenAI embeddings for this guide, you'll need to set your OpenAI key as well:\n", + "\n", + "```typescript\n", + "process.env.OPENAI_API_KEY = \"YOUR_API_KEY\";\n", + "```\n", + "\n", + "If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:\n", + "\n", + "```typescript\n", + "// process.env.LANGCHAIN_TRACING_V2=\"true\"\n", + "// process.env.LANGCHAIN_API_KEY=\"your-api-key\"\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "93df377e", + "metadata": {}, + "source": [ + "## Instantiation\n", + "\n", + "To instantiate the vector store, call the `.initialize()` static method. This will automatically check for the presence of a table, given by `tableName` in the passed `config`. If it is not there, it will create it with the required columns.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "dc37144c-208d-4ab3-9f3a-0407a69fe052", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import { OpenAIEmbeddings } from \"@langchain/openai\";\n", + "\n", + "import {\n", + " DistanceStrategy,\n", + " MariaDBStore,\n", + "} from \"@langchain/community/vectorstores/mariadb\";\n", + "import { PoolConfig } from \"mariadb\";\n", + "\n", + "const config = {\n", + " connectionOptions: {\n", + " type: \"mariadb\",\n", + " host: \"127.0.0.1\",\n", + " port: 3306,\n", + " user: \"myuser\",\n", + " password: \"ChangeMe\",\n", + " database: \"api\",\n", + " } as PoolConfig,\n", + " distanceStrategy: 'EUCLIDEAN' as DistanceStrategy,\n", + "};\n", + "const vectorStore = await MariaDBStore.initialize(\n", + " new OpenAIEmbeddings(),\n", + " config\n", + ");" + ] + }, + { + "cell_type": "markdown", + "id": "ac6071d4", + "metadata": {}, + "source": [ + "## Manage vector store\n", + "\n", + "### Add items to vector store" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "17f5efc0", + "metadata": {}, + "outputs": [], + "source": [ + "import { v4 as uuidv4 } from \"uuid\";\n", + "import type { Document } from \"@langchain/core/documents\";\n", + "\n", + "const document1: Document = {\n", + " pageContent: \"The powerhouse of the cell is the mitochondria\",\n", + " metadata: { source: \"https://example.com\" }\n", + "};\n", + "\n", + "const document2: Document = {\n", + " pageContent: \"Buildings are made out of brick\",\n", + " metadata: { source: \"https://example.com\" }\n", + "};\n", + "\n", + "const document3: Document = {\n", + " pageContent: \"Mitochondria are made out of lipids\",\n", + " metadata: { source: \"https://example.com\" }\n", + "};\n", + "\n", + "const document4: Document = {\n", + " pageContent: \"The 2024 Olympics are in Paris\",\n", + " metadata: { source: \"https://example.com\" }\n", + "}\n", + "\n", + "const documents = [document1, document2, document3, document4];\n", + "\n", + "const ids = [uuidv4(), uuidv4(), uuidv4(), uuidv4()]\n", + "\n", + "// ids are not mandatory, but that's for the example\n", + "await vectorStore.addDocuments(documents, { ids: ids });" + ] + }, + { + "cell_type": "markdown", + "id": "dcf1b905", + "metadata": {}, + "source": [ + "### Delete items from vector store" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "ef61e188", + "metadata": {}, + "outputs": [], + "source": [ + "const id4 = ids[ids.length - 1];\n", + "\n", + "await vectorStore.delete({ ids: [id4] });" + ] + }, + { + "cell_type": "markdown", + "id": "c3620501", + "metadata": {}, + "source": [ + "## Query vector store\n", + "\n", + "Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n", + "\n", + "### Query directly\n", + "\n", + "Performing a simple similarity search can be done as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "aa0a16fa", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* The powerhouse of the cell is the mitochondria [{\"year\": 2021}]\n", + "* Mitochondria are made out of lipids [{\"year\": 2022}]\n" + ] + } + ], + "source": [ + "const similaritySearchResults = await vectorStore.similaritySearch(\"biology\", 2, { \"year\": 2021 });\n", + "for (const doc of similaritySearchResults) {\n", + " console.log(`* ${doc.pageContent} [${JSON.stringify(doc.metadata, null)}]`);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "3ed9d733", + "metadata": {}, + "source": [ + "The above filter syntax use be more complex:\n", + "\n", + "```json\n", + "# name = 'martin' OR firstname = 'john'\n", + "let res = await vectorStore.similaritySearch(\"biology\", 2, {\"$or\": [{\"name\":\"martin\"}, {\"firstname\", \"john\"}] });\n", + "```\n", + "\n", + "If you want to execute a similarity search and receive the corresponding scores you can run:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "5efd2eaa", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* [SIM=0.835] The powerhouse of the cell is the mitochondria [{\"source\":\"https://example.com\"}]\n", + "* [SIM=0.852] Mitochondria are made out of lipids [{\"source\":\"https://example.com\"}]\n" + ] + } + ], + "source": [ + "const similaritySearchWithScoreResults = await vectorStore.similaritySearchWithScore(\"biology\", 2)\n", + "\n", + "for (const [doc, score] of similaritySearchWithScoreResults) {\n", + " console.log(`* [SIM=${score.toFixed(3)}] ${doc.pageContent} [${JSON.stringify(doc.metadata)}]`);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "0c235cdc", + "metadata": {}, + "source": [ + "### Query by turning into retriever\n", + "\n", + "You can also transform the vector store into a [retriever](/docs/concepts/retrievers) for easier usage in your chains. " + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "f3460093", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[\n", + " Document {\n", + " pageContent: 'The powerhouse of the cell is the mitochondria',\n", + " metadata: { source: 'https://example.com' },\n", + " id: undefined\n", + " },\n", + " Document {\n", + " pageContent: 'Mitochondria are made out of lipids',\n", + " metadata: { source: 'https://example.com' },\n", + " id: undefined\n", + " }\n", + "]\n" + ] + } + ], + "source": [ + "const retriever = vectorStore.asRetriever({\n", + " // Optional filter\n", + " // filter: filter,\n", + " k: 2,\n", + "});\n", + "await retriever.invoke(\"biology\");" + ] + }, + { + "cell_type": "markdown", + "id": "e2e0a211", + "metadata": {}, + "source": [ + "### Usage for retrieval-augmented generation\n", + "\n", + "For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n", + "\n", + "- [Tutorials: working with external knowledge](/docs/tutorials/#working-with-external-knowledge).\n", + "- [How-to: Question and answer with RAG](/docs/how_to/#qa-with-rag)\n", + "- [Retrieval conceptual docs](/docs/concepts/retrieval)" + ] + }, + { + "cell_type": "markdown", + "id": "371727a8", + "metadata": {}, + "source": [ + "## Advanced: reusing connections\n", + "\n", + "You can reuse connections by creating a pool, then creating new `MariaDBStore` instances directly via the constructor.\n", + "\n", + "Note that you should call `.initialize()` to set up your database at least once to set up your tables properly before using the constructor." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "09efeac4", + "metadata": {}, + "outputs": [], + "source": [ + "import { OpenAIEmbeddings } from \"@langchain/openai\";\n", + "import { MariaDBStore } from \"@langchain/community/vectorstores/mariadb\";\n", + "import mariadb from \"mariadb\";\n", + "\n", + "// First, follow set-up instructions at\n", + "// https://js.langchain.com/docs/modules/indexes/vector_stores/integrations/mariadb\n", + "\n", + "const reusablePool = mariadb.createPool({\n", + " host: \"127.0.0.1\",\n", + " port: 3306,\n", + " user: \"myuser\",\n", + " password: \"ChangeMe\",\n", + " database: \"api\",\n", + "});\n", + "\n", + "const originalConfig = {\n", + " pool: reusablePool,\n", + " tableName: \"testlangchainjs\",\n", + " collectionName: \"sample\",\n", + " collectionTableName: \"collections\",\n", + " columns: {\n", + " idColumnName: \"id\",\n", + " vectorColumnName: \"vect\",\n", + " contentColumnName: \"content\",\n", + " metadataColumnName: \"metadata\",\n", + " },\n", + "};\n", + "\n", + "// Set up the DB.\n", + "// Can skip this step if you've already initialized the DB.\n", + "// await MariaDBStore.initialize(new OpenAIEmbeddings(), originalConfig);\n", + "const mariadbStore = new MariaDBStore(new OpenAIEmbeddings(), originalConfig);\n", + "\n", + "await mariadbStore.addDocuments([\n", + " { pageContent: \"what's this\", metadata: { a: 2 } },\n", + " { pageContent: \"Cat drinks milk\", metadata: { a: 1 } },\n", + "]);\n", + "\n", + "const results = await mariadbStore.similaritySearch(\"water\", 1);\n", + "\n", + "console.log(results);\n", + "\n", + "/*\n", + " [ Document { pageContent: 'Cat drinks milk', metadata: { a: 1 } } ]\n", + "*/\n", + "\n", + "const mariadbStore2 = new MariaDBStore(new OpenAIEmbeddings(), {\n", + " pool: reusablePool,\n", + " tableName: \"testlangchainjs\",\n", + " collectionTableName: \"collections\",\n", + " collectionName: \"some_other_collection\",\n", + " columns: {\n", + " idColumnName: \"id\",\n", + " vectorColumnName: \"vector\",\n", + " contentColumnName: \"content\",\n", + " metadataColumnName: \"metadata\",\n", + " },\n", + "});\n", + "\n", + "const results2 = await mariadbStore2.similaritySearch(\"water\", 1);\n", + "\n", + "console.log(results2);\n", + "\n", + "/*\n", + " []\n", + "*/\n", + "\n", + "await reusablePool.end();" + ] + }, + { + "cell_type": "markdown", + "id": "069f1b5f", + "metadata": {}, + "source": [ + "## Closing connections\n", + "\n", + "Make sure you close the connection when you are finished to avoid excessive resource consumption:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f71ce986", + "metadata": {}, + "outputs": [], + "source": [ + "await vectorStore.end();" + ] + }, + { + "cell_type": "markdown", + "id": "8a27244f", + "metadata": {}, + "source": [ + "## API reference\n", + "\n", + "For detailed documentation of all `MariaDBStore` features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_community_vectorstores_mariadb.MariaDBStore.html)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "TypeScript", + "language": "typescript", + "name": "tslab" + }, + "language_info": { + "codemirror_mode": { + "mode": "typescript", + "name": "javascript", + "typescript": true + }, + "file_extension": ".ts", + "mimetype": "text/typescript", + "name": "typescript", + "version": "3.7.2" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/core_docs/src/theme/FeatureTables.js b/docs/core_docs/src/theme/FeatureTables.js index b2dec1539d53..c36ac434db1c 100644 --- a/docs/core_docs/src/theme/FeatureTables.js +++ b/docs/core_docs/src/theme/FeatureTables.js @@ -673,6 +673,19 @@ const FEATURE_TABLES = { local: true, idsInAddDocuments: false, }, + { + name: "mariadb", + link: "mariadb", + deleteById: true, + filtering: true, + searchByVector: true, + searchWithScore: true, + async: true, + passesStandardTests: false, + multiTenancy: false, + local: true, + idsInAddDocuments: false, + }, { name: "Milvus", link: "milvus", diff --git a/examples/package.json b/examples/package.json index 3f392a66bccf..a4cf798e8d32 100644 --- a/examples/package.json +++ b/examples/package.json @@ -95,6 +95,7 @@ "js-yaml": "^4.1.0", "langchain": "workspace:*", "langsmith": ">=0.2.8 <0.4.0", + "mariadb": "^3.4.0", "mongodb": "^6.3.0", "pg": "^8.11.0", "pickleparser": "^0.2.1", diff --git a/examples/src/indexes/vector_stores/mariadb_vectorstore/docker-compose.example.yml b/examples/src/indexes/vector_stores/mariadb_vectorstore/docker-compose.example.yml new file mode 100644 index 000000000000..c27be10e888a --- /dev/null +++ b/examples/src/indexes/vector_stores/mariadb_vectorstore/docker-compose.example.yml @@ -0,0 +1,10 @@ +services: + db: + image: mariadb/mariadb:11.7-rc + ports: + - 3306:3306 + environment: + - MARIADB_USER=myuser + - MARIADB_PASSWORD=ChangeMe + - MARIADB_ROOT_PASSWORD=ChangeMe + - MARIADB_DATABASE=api diff --git a/examples/src/indexes/vector_stores/mariadb_vectorstore/mariadb.ts b/examples/src/indexes/vector_stores/mariadb_vectorstore/mariadb.ts new file mode 100644 index 000000000000..23af634fc200 --- /dev/null +++ b/examples/src/indexes/vector_stores/mariadb_vectorstore/mariadb.ts @@ -0,0 +1,54 @@ +import { OpenAIEmbeddings } from "@langchain/openai"; +import { + DistanceStrategy, + MariaDBStore, +} from "@langchain/community/vectorstores/mariadb"; +import { PoolConfig } from "mariadb"; + +// First, follow set-up instructions at +// https://js.langchain.com/docs/modules/indexes/vector_stores/integrations/mariadb + +const config = { + connectionOptions: { + type: "mariadb", + host: "127.0.0.1", + port: 3306, + user: "myuser", + password: "ChangeMe", + database: "api", + } as PoolConfig, + distanceStrategy: "EUCLIDEAN" as DistanceStrategy, +}; + +const vectorStore = await MariaDBStore.initialize( + new OpenAIEmbeddings(), + config +); + +await vectorStore.addDocuments([ + { + pageContent: "what's this", + metadata: { country: "EN", year: 2021, city: "london" }, + }, + { pageContent: "Cat drinks milk", metadata: { country: "GE", year: 2020 } }, +]); + +const results = await vectorStore.similaritySearch("water", 1); + +console.log(results); +// [ Document { pageContent: 'Cat drinks milk', metadata: { country: 'GE', year: 2020 }, id: ... } ] + +// Filtering is supported +const results2 = await vectorStore.similaritySearch("water", 1, { + b: { $gte: { year: 2021 } }, +}); +console.log(results2); +// [ Document { pageContent: 'what's this', metadata: { country: 'EN', year: 2021, city: 'london' } } ] + +await vectorStore.delete({ filter: { b: { $gte: { year: 2021 } } } }); + +const results3 = await vectorStore.similaritySearch("water", 1); +console.log(results3); +// [ Document { pageContent: 'Cat drinks milk', metadata: { country: 'GE', year: 2020 }, id: ... } ] + +await vectorStore.end(); diff --git a/examples/src/indexes/vector_stores/mariadb_vectorstore/mariadb_pool.ts b/examples/src/indexes/vector_stores/mariadb_vectorstore/mariadb_pool.ts new file mode 100644 index 000000000000..1a48e721872a --- /dev/null +++ b/examples/src/indexes/vector_stores/mariadb_vectorstore/mariadb_pool.ts @@ -0,0 +1,65 @@ +import { OpenAIEmbeddings } from "@langchain/openai"; +import { MariaDBStore } from "@langchain/community/vectorstores/mariadb"; +import mariadb from "mariadb"; + +// First, follow set-up instructions at +// https://js.langchain.com/docs/modules/indexes/vector_stores/integrations/mariadb + +const reusablePool = mariadb.createPool({ + host: "127.0.0.1", + port: 3306, + user: "myuser", + password: "ChangeMe", + database: "api", +}); + +const originalConfig = { + pool: reusablePool, + tableName: "testlangchain", + collectionName: "sample", + collectionTableName: "collections", + columns: { + idColumnName: "id", + vectorColumnName: "vect", + contentColumnName: "content", + metadataColumnName: "metadata", + }, +}; + +// Set up the DB. +// Can skip this step if you've already initialized the DB. +const vectorStore = await MariaDBStore.initialize( + new OpenAIEmbeddings(), + originalConfig +); +// const vectorStore = new MariaDBStore(new OpenAIEmbeddings(), originalConfig); + +await vectorStore.addDocuments([ + { pageContent: "what's this", metadata: { a: 2 } }, + { pageContent: "Cat drinks milk", metadata: { a: 1 } }, +]); + +const results = await vectorStore.similaritySearch("water", 1); + +console.log(results); +// [ Document { pageContent: 'Cat drinks milk', metadata: { a: 1 }, id: ... } ] + +const vectorStore2 = new MariaDBStore(new OpenAIEmbeddings(), { + pool: reusablePool, + tableName: "testlangchain", + collectionTableName: "collections", + collectionName: "some_other_collection", + columns: { + idColumnName: "id", + vectorColumnName: "vect", + contentColumnName: "content", + metadataColumnName: "metadata", + }, +}); + +const results2 = await vectorStore2.similaritySearch("water", 1); + +console.log(results2); +// [] + +await reusablePool.end(); diff --git a/libs/langchain-community/.gitignore b/libs/langchain-community/.gitignore index ae0258bd42da..4dcae0b0e154 100644 --- a/libs/langchain-community/.gitignore +++ b/libs/langchain-community/.gitignore @@ -402,6 +402,10 @@ vectorstores/libsql.cjs vectorstores/libsql.js vectorstores/libsql.d.ts vectorstores/libsql.d.cts +vectorstores/mariadb.cjs +vectorstores/mariadb.js +vectorstores/mariadb.d.ts +vectorstores/mariadb.d.cts vectorstores/milvus.cjs vectorstores/milvus.js vectorstores/milvus.d.ts diff --git a/libs/langchain-community/langchain.config.js b/libs/langchain-community/langchain.config.js index dca43ba96d44..2b17c88ac3e0 100644 --- a/libs/langchain-community/langchain.config.js +++ b/libs/langchain-community/langchain.config.js @@ -138,6 +138,7 @@ export const config = { "vectorstores/hanavector": "vectorstores/hanavector", "vectorstores/lancedb": "vectorstores/lancedb", "vectorstores/libsql": "vectorstores/libsql", + "vectorstores/mariadb": "vectorstores/mariadb", "vectorstores/milvus": "vectorstores/milvus", "vectorstores/momento_vector_index": "vectorstores/momento_vector_index", "vectorstores/mongodb_atlas": "vectorstores/mongodb_atlas", @@ -406,6 +407,7 @@ export const config = { "vectorstores/hanavector", "vectorstores/lancedb", "vectorstores/libsql", + "vectorstores/mariadb", "vectorstores/milvus", "vectorstores/momento_vector_index", "vectorstores/mongodb_atlas", diff --git a/libs/langchain-community/package.json b/libs/langchain-community/package.json index b70a9c28de34..2d777e067327 100644 --- a/libs/langchain-community/package.json +++ b/libs/langchain-community/package.json @@ -112,6 +112,7 @@ "@tensorflow/tfjs-backend-cpu": "^3", "@tensorflow/tfjs-converter": "^3.6.0", "@tensorflow/tfjs-core": "^3.6.0", + "@testcontainers/mariadb": "^10.16.0", "@tsconfig/recommended": "^1.0.2", "@types/better-sqlite3": "^7.6.10", "@types/crypto-js": "^4.2.2", @@ -188,6 +189,7 @@ "lodash": "^4.17.21", "lunary": "^0.7.10", "mammoth": "^1.6.0", + "mariadb": "^3.4.0", "mongodb": "^5.2.0", "mysql2": "^3.9.8", "neo4j-driver": "^5.17.0", @@ -320,6 +322,7 @@ "lodash": "^4.17.21", "lunary": "^0.7.10", "mammoth": "^1.6.0", + "mariadb": "^3.4.0", "mongodb": ">=5.2.0", "mysql2": "^3.9.8", "neo4j-driver": "*", @@ -634,6 +637,9 @@ "mammoth": { "optional": true }, + "mariadb": { + "optional": true + }, "mongodb": { "optional": true }, @@ -1626,6 +1632,15 @@ "import": "./vectorstores/libsql.js", "require": "./vectorstores/libsql.cjs" }, + "./vectorstores/mariadb": { + "types": { + "import": "./vectorstores/mariadb.d.ts", + "require": "./vectorstores/mariadb.d.cts", + "default": "./vectorstores/mariadb.d.ts" + }, + "import": "./vectorstores/mariadb.js", + "require": "./vectorstores/mariadb.cjs" + }, "./vectorstores/milvus": { "types": { "import": "./vectorstores/milvus.d.ts", @@ -3627,6 +3642,10 @@ "vectorstores/libsql.js", "vectorstores/libsql.d.ts", "vectorstores/libsql.d.cts", + "vectorstores/mariadb.cjs", + "vectorstores/mariadb.js", + "vectorstores/mariadb.d.ts", + "vectorstores/mariadb.d.cts", "vectorstores/milvus.cjs", "vectorstores/milvus.js", "vectorstores/milvus.d.ts", diff --git a/libs/langchain-community/src/load/import_map.ts b/libs/langchain-community/src/load/import_map.ts index bc70dbf860c5..55f59ffbb083 100644 --- a/libs/langchain-community/src/load/import_map.ts +++ b/libs/langchain-community/src/load/import_map.ts @@ -43,6 +43,7 @@ export * as llms__friendli from "../llms/friendli.js"; export * as llms__ollama from "../llms/ollama.js"; export * as llms__togetherai from "../llms/togetherai.js"; export * as llms__yandex from "../llms/yandex.js"; +export * as vectorstores__mariadb from "../vectorstores/mariadb.js"; export * as vectorstores__prisma from "../vectorstores/prisma.js"; export * as vectorstores__turbopuffer from "../vectorstores/turbopuffer.js"; export * as vectorstores__vectara from "../vectorstores/vectara.js"; diff --git a/libs/langchain-community/src/vectorstores/mariadb.ts b/libs/langchain-community/src/vectorstores/mariadb.ts new file mode 100644 index 000000000000..9d385e7f0b8b --- /dev/null +++ b/libs/langchain-community/src/vectorstores/mariadb.ts @@ -0,0 +1,814 @@ +import mariadb, { type Pool, type PoolConfig } from "mariadb"; +import { VectorStore } from "@langchain/core/vectorstores"; +import type { EmbeddingsInterface } from "@langchain/core/embeddings"; +import { Document } from "@langchain/core/documents"; +import { getEnvironmentVariable } from "@langchain/core/utils/env"; + +type Metadata = Record; + +export type DistanceStrategy = "COSINE" | "EUCLIDEAN"; + +const STANDARD_SIMPLE_OPERATOR = new Map([ + ["$eq", "="], + ["$ne", "!="], + ["$lt", "<"], + ["$lte", "<="], + ["$gt", ">"], + ["$gte", ">="], +]); + +const STANDARD_LIST_OPERATOR = new Map([ + ["$in", "in"], + ["$nin", "not in"], +]); + +const STANDARD_BETWEEN_OPERATOR = new Map([ + ["$like", "like"], + ["$nlike", "no like"], +]); + +const GROUP_OPERATORS = new Map([ + ["$or", "or"], + ["$and", "and"], + ["$not", "not"], +]); + +const SUPPORTED_OPERATORS = new Map([ + ...STANDARD_SIMPLE_OPERATOR, + ...STANDARD_LIST_OPERATOR, + ...STANDARD_BETWEEN_OPERATOR, + ...GROUP_OPERATORS, +]); + +/** + * Interface that defines the arguments required to create a + * `MariaDBStore` instance. It includes MariaDB connection options, + * table name and verbosity level. + */ +export interface MariaDBStoreArgs { + connectionOptions?: PoolConfig; + pool?: Pool; + tableName?: string; + collectionTableName?: string; + collectionName?: string; + collectionMetadata?: Metadata | null; + schemaName?: string | null; + columns?: { + idColumnName?: string; + vectorColumnName?: string; + contentColumnName?: string; + metadataColumnName?: string; + }; + verbose?: boolean; + /** + * The amount of documents to chunk by when + * adding vectors. + * @default 500 + */ + chunkSize?: number; + ids?: string[]; + distanceStrategy?: DistanceStrategy; +} + +/** + * MariaDB vector store integration. + * + * Setup: + * Install `@langchain/community` and `mariadb`. + * + * If you wish to generate ids, you should also install the `uuid` package. + * + * ```bash + * npm install @langchain/community mariadb uuid + * ``` + * + * ## [Constructor args](https://api.js.langchain.com/classes/_langchain_community.vectorstores_mariadb.MariaDB.html#constructor) + * + *
+ * Instantiate + * + * ```typescript + * import { + * MariaDBStore, + * DistanceStrategy, + * } from "@langchain/community/vectorstores/mariadb"; + * + * // Or other embeddings + * import { OpenAIEmbeddings } from "@langchain/openai"; + * import { PoolConfig } from "mariadb"; + * + * const embeddings = new OpenAIEmbeddings({ + * model: "text-embedding-3-small", + * }); + * + * // Sample config + * const config = { + * connectionOptions: { + * host: "127.0.0.1", + * port: 3306, + * user: "myuser", + * password: "ChangeMe", + * database: "api", + * } as PoolConfig, + * tableName: "testlangchainjs", + * columns: { + * idColumnName: "id", + * vectorColumnName: "vector", + * contentColumnName: "content", + * metadataColumnName: "metadata", + * }, + * // supported distance strategies: COSINE (default) or EUCLIDEAN + * distanceStrategy: "COSINE" as DistanceStrategy, + * }; + * + * const vectorStore = await MariaDBStore.initialize(embeddings, config); + * ``` + *
+ * + *
+ * + *
+ * Add documents + * + * ```typescript + * import type { Document } from '@langchain/core/documents'; + * + * const document1 = { pageContent: "foo", metadata: { baz: "bar" } }; + * const document2 = { pageContent: "thud", metadata: { bar: "baz" } }; + * const document3 = { pageContent: "i will be deleted :(", metadata: {} }; + * + * const documents: Document[] = [document1, document2, document3]; + * const ids = ["1", "2", "3"]; + * await vectorStore.addDocuments(documents, { ids }); + * ``` + *
+ * + *
+ * + *
+ * Delete documents + * + * ```typescript + * await vectorStore.delete({ ids: ["3"] }); + * ``` + *
+ * + *
+ * + *
+ * Similarity search + * + * ```typescript + * const results = await vectorStore.similaritySearch("thud", 1); + * for (const doc of results) { + * console.log(`* ${doc.pageContent} [${JSON.stringify(doc.metadata, null)}]`); + * } + * // Output: * thud [{"baz":"bar"}] + * ``` + *
+ * + *
+ * + * + *
+ * Similarity search with filter + * + * ```typescript + * const resultsWithFilter = await vectorStore.similaritySearch("thud", 1, {"country": "BG"}); + * + * for (const doc of resultsWithFilter) { + * console.log(`* ${doc.pageContent} [${JSON.stringify(doc.metadata, null)}]`); + * } + * // Output: * foo [{"baz":"bar"}] + * ``` + *
+ * + *
+ * + * + *
+ * Similarity search with score + * + * ```typescript + * const resultsWithScore = await vectorStore.similaritySearchWithScore("qux", 1); + * for (const [doc, score] of resultsWithScore) { + * console.log(`* [SIM=${score.toFixed(6)}] ${doc.pageContent} [${JSON.stringify(doc.metadata, null)}]`); + * } + * // Output: * [SIM=0.000000] qux [{"bar":"baz","baz":"bar"}] + * ``` + *
+ * + *
+ * + *
+ * As a retriever + * + * ```typescript + * const retriever = vectorStore.asRetriever({ + * searchType: "mmr", // Leave blank for standard similarity search + * k: 1, + * }); + * const resultAsRetriever = await retriever.invoke("thud"); + * console.log(resultAsRetriever); + * + * // Output: [Document({ metadata: { "baz":"bar" }, pageContent: "thud" })] + * ``` + *
+ * + *
+ */ +export class MariaDBStore extends VectorStore { + tableName: string; + + collectionTableName?: string; + + collectionName = "langchain"; + + collectionId?: string; + + collectionMetadata: Metadata | null; + + schemaName: string | null; + + idColumnName: string; + + vectorColumnName: string; + + contentColumnName: string; + + metadataColumnName: string; + + _verbose?: boolean; + + pool: Pool; + + chunkSize = 500; + + distanceStrategy: DistanceStrategy; + + constructor(embeddings: EmbeddingsInterface, config: MariaDBStoreArgs) { + super(embeddings, config); + this.tableName = this.escapeId(config.tableName ?? "langchain", false); + if ( + config.collectionName !== undefined && + config.collectionTableName === undefined + ) { + throw new Error( + `If supplying a "collectionName", you must also supply a "collectionTableName".` + ); + } + + this.collectionTableName = config.collectionTableName + ? this.escapeId(config.collectionTableName, false) + : undefined; + + this.collectionName = config.collectionName + ? this.escapeId(config.collectionName, false) + : "langchaincol"; + + this.collectionMetadata = config.collectionMetadata ?? null; + this.schemaName = config.schemaName + ? this.escapeId(config.schemaName, false) + : null; + + this.vectorColumnName = this.escapeId( + config.columns?.vectorColumnName ?? "embedding", + false + ); + this.contentColumnName = this.escapeId( + config.columns?.contentColumnName ?? "text", + false + ); + this.idColumnName = this.escapeId( + config.columns?.idColumnName ?? "id", + false + ); + this.metadataColumnName = this.escapeId( + config.columns?.metadataColumnName ?? "metadata", + false + ); + + if (!config.connectionOptions && !config.pool) { + throw new Error( + "You must provide either a `connectionOptions` object or a `pool` instance." + ); + } + + const langchainVerbose = getEnvironmentVariable("LANGCHAIN_VERBOSE"); + + if (langchainVerbose === "true") { + this._verbose = true; + } else if (langchainVerbose === "false") { + this._verbose = false; + } else { + this._verbose = config.verbose; + } + + if (config.pool) { + this.pool = config.pool; + } else { + const poolConf = { ...config.connectionOptions, rowsAsArray: true }; + // add query to log if verbose + if (this._verbose) poolConf.logger = { query: console.log }; + this.pool = mariadb.createPool(poolConf); + } + this.chunkSize = config.chunkSize ?? 500; + + this.distanceStrategy = + config.distanceStrategy ?? ("COSINE" as DistanceStrategy); + } + + get computedTableName() { + return this.schemaName == null + ? this.tableName + : `${this.schemaName}.${this.tableName}`; + } + + get computedCollectionTableName() { + return this.schemaName == null + ? `${this.collectionTableName}` + : `"${this.schemaName}"."${this.collectionTableName}"`; + } + + /** + * Escape identifier + * + * @param identifier identifier value + * @param alwaysQuote must identifier be quoted if not required + */ + private escapeId(identifier: string, alwaysQuote: boolean): string { + if (!identifier || identifier === "") + throw new Error("Identifier is required"); + + const len = identifier.length; + const simpleIdentifier = /^[0-9a-zA-Z$_]*$/; + if (simpleIdentifier.test(identifier)) { + if (len < 1 || len > 64) { + throw new Error("Invalid identifier length"); + } + if (alwaysQuote) return `\`${identifier}\``; + + // Identifier names may begin with a numeral, but can't only contain numerals unless quoted. + if (/^\d+$/.test(identifier)) { + // identifier containing only numerals must be quoted + return `\`${identifier}\``; + } + // identifier containing only numerals must be quoted + return identifier; + } else { + if (identifier.includes("\u0000")) { + throw new Error("Invalid name - containing u0000 character"); + } + let ident = identifier; + if (/^`.+`$/.test(identifier)) { + ident = identifier.substring(1, identifier.length - 1); + } + if (len < 1 || len > 64) { + throw new Error("Invalid identifier length"); + } + return `\`${ident.replace(/`/g, "``")}\``; + } + } + + private printable(definition: string): string { + return definition.replaceAll(/[^0-9a-zA-Z_]/g, ""); + } + + /** + * Static method to create a new `MariaDBStore` instance from a + * connection. It creates a table if one does not exist, and calls + * `connect` to return a new instance of `MariaDBStore`. + * + * @param embeddings - Embeddings instance. + * @param fields - `MariaDBStoreArgs` instance + * @param fields.dimensions Number of dimensions in your vector data type. default to 1536. + * @returns A new instance of `MariaDBStore`. + */ + static async initialize( + embeddings: EmbeddingsInterface, + config: MariaDBStoreArgs & { dimensions?: number } + ): Promise { + const { dimensions, ...rest } = config; + const mariadbStore = new MariaDBStore(embeddings, rest); + await mariadbStore.ensureTableInDatabase(dimensions); + await mariadbStore.ensureCollectionTableInDatabase(); + await mariadbStore.loadCollectionId(); + + return mariadbStore; + } + + /** + * Static method to create a new `MariaDBStore` instance from an + * array of texts and their metadata. It converts the texts into + * `Document` instances and adds them to the store. + * + * @param texts - Array of texts. + * @param metadatas - Array of metadata objects or a single metadata object. + * @param embeddings - Embeddings instance. + * @param dbConfig - `MariaDBStoreArgs` instance. + * @returns Promise that resolves with a new instance of `MariaDBStore`. + */ + static async fromTexts( + texts: string[], + metadatas: object[] | object, + embeddings: EmbeddingsInterface, + dbConfig: MariaDBStoreArgs & { dimensions?: number } + ): Promise { + const docs = []; + for (let i = 0; i < texts.length; i += 1) { + const metadata = Array.isArray(metadatas) ? metadatas[i] : metadatas; + const newDoc = new Document({ + pageContent: texts[i], + metadata, + }); + docs.push(newDoc); + } + + return MariaDBStore.fromDocuments(docs, embeddings, dbConfig); + } + + /** + * Static method to create a new `MariaDBStore` instance from an + * array of `Document` instances. It adds the documents to the store. + * + * @param docs - Array of `Document` instances. + * @param embeddings - Embeddings instance. + * @param dbConfig - `MariaDBStoreArgs` instance. + * @returns Promise that resolves with a new instance of `MariaDBStore`. + */ + static async fromDocuments( + docs: Document[], + embeddings: EmbeddingsInterface, + dbConfig: MariaDBStoreArgs & { dimensions?: number } + ): Promise { + const instance = await MariaDBStore.initialize(embeddings, dbConfig); + await instance.addDocuments(docs, { ids: dbConfig.ids }); + return instance; + } + + _vectorstoreType(): string { + return "mariadb"; + } + + /** + * Method to add documents to the vector store. It converts the documents into + * vectors, and adds them to the store. + * + * @param documents - Array of `Document` instances. + * @param options - Optional arguments for adding documents + * @returns Promise that resolves when the documents have been added. + */ + async addDocuments( + documents: Document[], + options?: { ids?: string[] } + ): Promise { + const texts = documents.map(({ pageContent }) => pageContent); + + return this.addVectors( + await this.embeddings.embedDocuments(texts), + documents, + options + ); + } + + /** + * Inserts a row for the collectionName provided at initialization if it does not + * exist and set the collectionId. + */ + private async loadCollectionId(): Promise { + if (this.collectionId) { + return; + } + + if (this.collectionTableName) { + const queryResult = await this.pool.query( + { + sql: `SELECT uuid from ${this.computedCollectionTableName} WHERE label = ?`, + rowsAsArray: true, + }, + [this.collectionName] + ); + if (queryResult.length > 0) { + this.collectionId = queryResult[0][0]; + } else { + const insertString = `INSERT INTO ${this.computedCollectionTableName}(label, cmetadata) VALUES (?, ?) RETURNING uuid`; + const insertResult = await this.pool.query( + { sql: insertString, rowsAsArray: true }, + [this.collectionName, this.collectionMetadata] + ); + this.collectionId = insertResult[0][0]; + } + } + } + + /** + * Method to add vectors to the vector store. It converts the vectors into + * rows and inserts them into the database. + * + * @param vectors - Array of vectors. + * @param documents - Array of `Document` instances. + * @param options - Optional arguments for adding documents + * @returns Promise that resolves when the vectors have been added. + */ + async addVectors( + vectors: number[][], + documents: Document[], + options?: { ids?: string[] } + ): Promise { + const ids = options?.ids; + + // Either all documents have ids or none of them do to avoid confusion. + if (ids !== undefined && ids.length !== vectors.length) { + throw new Error( + "The number of ids must match the number of vectors provided." + ); + } + await this.loadCollectionId(); + + const insertQuery = `INSERT INTO ${this.computedTableName}(${ + this.idColumnName + },${this.contentColumnName},${this.metadataColumnName},${ + this.vectorColumnName + }${this.collectionId ? ",collection_id" : ""}) VALUES (${ + ids ? "?" : "UUID_v7()" + }, ?, ?, ?${this.collectionId ? ", ?" : ""})`; + + try { + const batchParams = []; + for (let i = 0; i < vectors.length; i += 1) { + const param = [ + ids ? ids[i] : null, + documents[i].pageContent, + documents[i].metadata, + this.getFloat32Buffer(vectors[i]), + this.collectionId, + ]; + if (!ids) param.shift(); + if (!this.collectionId) param.pop(); + batchParams.push(param); + } + await this.pool.batch(insertQuery, batchParams); + } catch (e) { + console.error(e); + throw new Error(`Error inserting: ${(e as Error).message}`); + } + } + + /** + * Convert float array to binary value + * @param vector embedding value + * @private + */ + private getFloat32Buffer(vector: number[]) { + return Buffer.from(new Float32Array(vector).buffer); + } + + /** + * Method to delete documents from the vector store. It deletes the + * documents that match the provided ids + * + * @param ids - array of ids + * @returns Promise that resolves when the documents have been deleted. + * @example + * await vectorStore.delete(["id1", "id2"]); + */ + async delete(params: { + ids?: string[]; + filter?: Record; + }): Promise { + const { ids, filter } = params; + + if (!(ids || filter)) { + throw new Error( + "You must specify either ids or a filter when deleting documents." + ); + } + await this.loadCollectionId(); + + if (ids) { + // delete by ids + await this.pool.query( + `DELETE FROM ${this.computedTableName} WHERE ${ + this.idColumnName + } IN (?) ${this.collectionId ? " AND collection_id = ?" : ""}`, + [ids, this.collectionId] + ); + } else if (filter) { + // delete by filter + const [filterPart, params] = this.filterConverter(filter); + if (filterPart.length === 0) throw new Error("Wrong filter."); + await this.pool.query( + `DELETE FROM ${this.computedTableName} WHERE ${filterPart} ${ + this.collectionId ? " AND collection_id = ?" : "" + }`, + [...params, this.collectionId] + ); + } + } + + private filterConverter(filter?: Record): [string, any[]] { + if (!filter) return ["", []]; + const _filter: Record = filter ?? {}; + const parameters: any[] = []; + let sqlFilter = this.subFilterConverter(_filter, parameters, "$and"); + if (sqlFilter.charAt(0) === "(") { + sqlFilter = sqlFilter.substring(1, sqlFilter.length - 1); + } + return [sqlFilter, parameters]; + } + + private subFilterConverter( + filter: Record, + parameters: any[], + groupOperator: string + ): string { + const sqlFilterPart = []; + + for (const [key, value] of Object.entries(filter)) { + if (typeof value === "object" && value !== null) { + // eslint-disable-next-line @typescript-eslint/no-explicit-any + const _value: Record = value; + for (const [type, subvalue] of Object.entries(_value)) { + let realvalue = subvalue; + if (STANDARD_LIST_OPERATOR.has(type)) { + if (!Array.isArray(realvalue)) { + if ( + typeof realvalue !== "string" || + typeof realvalue !== "number" + ) { + throw new Error( + "value for in/not in filter are expected to be an array type" + ); + } + realvalue = [realvalue]; + } + + const placeholders = realvalue.map(() => "?").join(","); + sqlFilterPart.push( + `JSON_VALUE(${ + this.metadataColumnName + }, '$.${key}') ${STANDARD_LIST_OPERATOR.get( + type + )} (${placeholders})` + ); + parameters.push(...realvalue); + } else if (GROUP_OPERATORS.has(type)) { + sqlFilterPart.push( + this.subFilterConverter(realvalue, parameters, type) + ); + } else if (SUPPORTED_OPERATORS.has(type)) { + sqlFilterPart.push( + `JSON_VALUE(${ + this.metadataColumnName + }, '$.${key}') ${SUPPORTED_OPERATORS.get(type)} ?` + ); + parameters.push(realvalue); + } else { + throw new Error( + `unknown type operation, must be in ${SUPPORTED_OPERATORS.keys()}` + ); + } + } + } else { + sqlFilterPart.push( + `JSON_VALUE(${this.metadataColumnName}, '$.${key}') = ?` + ); + parameters.push(value); + } + } + if (sqlFilterPart.length > 1) { + return `(${sqlFilterPart.join( + " " + GROUP_OPERATORS.get(groupOperator) + " " + )})`; + } else { + return sqlFilterPart[0]; + } + } + + /** + * Method to perform a similarity search in the vector store. It returns + * the `k` most similar documents to the query vector, along with their + * similarity scores. + * + * @param query - Query vector. + * @param k - Number of most similar documents to return. + * @param filter - Optional filter to apply to the search. + * @returns Promise that resolves with an array of tuples, each containing a `Document` and its similarity score. + */ + async similaritySearchVectorWithScore( + query: number[], + k: number, + filter?: Record + ): Promise<[Document, number][]> { + // eslint-disable-next-line @typescript-eslint/no-explicit-any + const parameters: unknown[] = [this.getFloat32Buffer(query)]; + const whereClauses = []; + + await this.loadCollectionId(); + + if (this.collectionId) { + whereClauses.push("collection_id = ?"); + parameters.push(this.collectionId); + } + + if (filter) { + const [filterPart, params] = this.filterConverter(filter); + whereClauses.push(filterPart); + parameters.push(...params); + } + + // limit + parameters.push(k); + + const whereClause = whereClauses.length + ? `WHERE ${whereClauses.join(" AND ")}` + : ""; + + const queryString = `SELECT ${this.idColumnName},${this.contentColumnName},${this.metadataColumnName},VEC_DISTANCE_${this.distanceStrategy}(${this.vectorColumnName}, ?) as distance FROM ${this.computedTableName} ${whereClause} ORDER BY distance ASC LIMIT ?`; + + const documents = await this.pool.execute( + { sql: queryString, rowsAsArray: true }, + parameters + ); + + const results = [] as [Document, number][]; + for (const doc of documents) { + if (doc[3] != null && doc[1] != null) { + const document = new Document({ + id: doc[0], + pageContent: doc[1], + metadata: doc[2], + }); + results.push([document, doc[3]]); + } + } + return results; + } + + /** + * Method to ensure the existence of the table in the database. It creates + * the table if it does not already exist. + * @param dimensions Number of dimensions in your vector data type. Default to 1536. + * @returns Promise that resolves when the table has been ensured. + */ + async ensureTableInDatabase(dimensions = 1536): Promise { + const tableQuery = `CREATE TABLE IF NOT EXISTS ${this.computedTableName}(${ + this.idColumnName + } UUID NOT NULL DEFAULT UUID_v7() PRIMARY KEY,${ + this.contentColumnName + } TEXT,${this.metadataColumnName} JSON,${ + this.vectorColumnName + } VECTOR(${dimensions}) NOT NULL, VECTOR INDEX ${this.printable( + this.tableName + "_" + this.vectorColumnName + )}_idx (${this.vectorColumnName}) ) ENGINE=InnoDB`; + await this.pool.query(tableQuery); + } + + /** + * Method to ensure the existence of the collection table in the database. + * It creates the table if it does not already exist. + * + * @returns Promise that resolves when the collection table has been ensured. + */ + async ensureCollectionTableInDatabase(): Promise { + try { + if (this.collectionTableName != null) { + await Promise.all([ + this.pool.query( + `CREATE TABLE IF NOT EXISTS ${ + this.computedCollectionTableName + }(uuid UUID NOT NULL DEFAULT UUID_v7() PRIMARY KEY, + label VARCHAR(256), cmetadata JSON, UNIQUE KEY idx_${this.printable( + this.collectionTableName + )}_label + (label))` + ), + this.pool.query( + `ALTER TABLE ${this.computedTableName} + ADD COLUMN IF NOT EXISTS collection_id uuid, + ADD CONSTRAINT FOREIGN KEY IF NOT EXISTS ${this.printable( + this.tableName + )}_collection_id_fkey (collection_id) + REFERENCES ${ + this.computedCollectionTableName + }(uuid) ON DELETE CASCADE` + ), + ]); + } + } catch (e) { + console.error(e); + throw new Error( + `Error adding column or creating index: ${(e as Error).message}` + ); + } + } + + /** + * Close the pool. + * + * @returns Promise that resolves when the pool is terminated. + */ + async end(): Promise { + return this.pool.end(); + } +} diff --git a/libs/langchain-community/src/vectorstores/tests/mariadb.int.test.ts b/libs/langchain-community/src/vectorstores/tests/mariadb.int.test.ts new file mode 100644 index 000000000000..f5badb60a16a --- /dev/null +++ b/libs/langchain-community/src/vectorstores/tests/mariadb.int.test.ts @@ -0,0 +1,301 @@ +import { + MariaDbContainer, + StartedMariaDbContainer, +} from "@testcontainers/mariadb"; +import { OpenAIEmbeddings } from "@langchain/openai"; +import { type Pool, PoolConfig } from "mariadb"; +import { MariaDBStore, MariaDBStoreArgs } from "../mariadb.js"; + +const isFullyQualifiedTableExists = async ( + pool: Pool, + schema: string, + tableName: string +): Promise => { + const sql = + "SELECT EXISTS (SELECT * FROM information_schema.tables WHERE table_schema = ? AND table_name = ?) as results"; + const res = await pool.query(sql, [schema, tableName]); + return res[0][0] as boolean; +}; +const removeQuotes = (field: string): string => { + if (field.charAt(0) === "`") return field.substring(1, field.length - 1); + return field; +}; +const areColumnsExisting = async ( + pool: Pool, + schema: string, + tableName: string, + fieldNames: string[] +): Promise => { + const sql = + "SELECT EXISTS (SELECT * FROM information_schema.columns WHERE table_schema= ? AND table_name = ? AND column_name = ?)"; + + for (let i = 0; i < fieldNames.length; i += 1) { + const res = await pool.query(sql, [ + schema, + removeQuotes(tableName), + removeQuotes(fieldNames[i]), + ]); + if (res[0][0]) continue; + return false; + } + return true; +}; + +describe("MariaDBVectorStore", () => { + let container: StartedMariaDbContainer; + + beforeAll(async () => { + container = await new MariaDbContainer("mariadb:11.7-rc").start(); + }); + + afterAll(async () => { + await container.stop(); + }); + + describe("automatic table creation", () => { + it.each([ + ["myTable", "myId", "myVector", "myContent", "myMetadata", undefined], + [ + "myTable 2", + "myId 2", + "myVector 2", + "myContent 2", + "myMetadata 2", + undefined, + ], + [ + "myTable", + "myId", + "myVector", + "myContent", + "myMetadata", + "myCollectionTableName", + ], + [ + "myTable` 2", + "myId` 2", + "myVector` 2", + "myContent` 2", + "myMetadata` 2", + "myCollectionTableName` 2", + ], + ])( + "automatic table %p %p %p %p %p", + async ( + tableName: string, + idColumnName: string, + vectorColumnName: string, + contentColumnName: string, + metadataColumnName: string, + collectionTableName?: string + ) => { + const localStore = await MariaDBStore.initialize( + new OpenAIEmbeddings(), + { + connectionOptions: { + host: container.getHost(), + port: container.getFirstMappedPort(), + user: container.getUsername(), + password: container.getUserPassword(), + database: container.getDatabase(), + } as PoolConfig, + tableName, + columns: { + idColumnName, + vectorColumnName, + contentColumnName, + metadataColumnName, + }, + collectionTableName, + distanceStrategy: "EUCLIDEAN", + } as MariaDBStoreArgs + ); + expect( + isFullyQualifiedTableExists( + localStore.pool, + container.getDatabase(), + "myTable" + ) + ).toBeTruthy(); + expect( + areColumnsExisting( + localStore.pool, + container.getDatabase(), + "myTable", + ["myId", "myVector", "myContent", "myMetadata"] + ) + ).toBeTruthy(); + await localStore.similaritySearch("hello", 10); + await localStore.delete({ + ids: ["63ae8c92-799a-11ef-98b2-f859713e4be4"], + }); + const documents = [ + { pageContent: "hello", metadata: { a: 2023, country: "US" } }, + ]; + await localStore.addDocuments(documents); + await localStore.pool.query("DROP TABLE " + localStore.tableName); + } + ); + }); + + describe("without collection", () => { + let store: MariaDBStore; + + beforeAll(async () => { + store = await MariaDBStore.initialize(new OpenAIEmbeddings(), { + connectionOptions: { + type: "mariadb", + host: container.getHost(), + port: container.getFirstMappedPort(), + user: container.getUsername(), + password: container.getUserPassword(), + database: container.getDatabase(), + } as PoolConfig, + } as MariaDBStoreArgs); + }); + + const documents = [ + { pageContent: "hello", metadata: { a: 2023, country: "US" } }, + { pageContent: "Cat drinks milk", metadata: { a: 2025, country: "EN" } }, + { pageContent: "hi", metadata: { a: 2025, country: "FR" } }, + ]; + const ids = [ + "cd41294a-afb0-11df-bc9b-00241dd75637", + "a2443495-1b94-415b-b6fa-fe8e79ba4812", + "63ae8c92-799a-11ef-98b2-f859713e4be4", + ]; + beforeEach(async () => { + await store.pool.query("TRUNCATE TABLE " + store.tableName); + await store.addDocuments(documents, { ids }); + }); + test("similarity limit", async () => { + let results = await store.similaritySearch("hello", 10); + expect(results.length).toEqual(3); + expect(results[0].pageContent).toEqual("hello"); + expect(results[0].metadata.a).toEqual(2023); + + results = await store.similaritySearch("hello", 1); + expect(results.length).toEqual(1); + expect(results[0].pageContent).toEqual("hello"); + expect(results[0].metadata.a).toEqual(2023); + }); + + test("similarity with filter", async () => { + let results = await store.similaritySearch("hi", 10, { a: 2025 }); + expect(results.length).toEqual(2); + expect(results[0].pageContent).toEqual("hi"); + expect(results[0].metadata.a).toEqual(2025); + + results = await store.similaritySearch("hi", 10, { + a: { $gte: 2025 }, + country: { $in: ["GE", "FR"] }, + }); + expect(results.length).toEqual(1); + expect(results[0].pageContent).toEqual("hi"); + expect(results[0].metadata.a).toEqual(2025); + }); + + test("deletion with filter", async () => { + try { + await store.delete({}); + throw new Error("expected to fails"); + } catch (e) { + expect((e as Error).message).toEqual( + "You must specify either ids or a filter when deleting documents." + ); + } + + await store.delete({ filter: { a: { $eq: 2023 } } }); + let res = await store.pool.query( + "SELECT COUNT(*) as a FROM " + store.tableName + ); + expect(res[0][0]).toEqual(2n); + + await store.delete({ ids: ["63ae8c92-799a-11ef-98b2-f859713e4be4"] }); + res = await store.pool.query("SELECT COUNT(*) FROM " + store.tableName); + expect(res[0][0]).toEqual(1n); + }); + }); + + describe("with collection", () => { + let store: MariaDBStore; + + beforeAll(async () => { + store = await MariaDBStore.initialize(new OpenAIEmbeddings(), { + connectionOptions: { + type: "mariadb", + host: container.getHost(), + port: container.getFirstMappedPort(), + user: container.getUsername(), + password: container.getUserPassword(), + database: container.getDatabase(), + } as PoolConfig, + collectionTableName: "myCollectionTable", + } as MariaDBStoreArgs); + }); + + const documents = [ + { pageContent: "hello", metadata: { a: 2023, country: "US" } }, + { pageContent: "Cat drinks milk", metadata: { a: 2025, country: "EN" } }, + { pageContent: "hi", metadata: { a: 2025, country: "FR" } }, + ]; + const ids = [ + "cd41294a-afb0-11df-bc9b-00241dd75637", + "a2443495-1b94-415b-b6fa-fe8e79ba4812", + "63ae8c92-799a-11ef-98b2-f859713e4be4", + ]; + + beforeEach(async () => { + await store.pool.query("TRUNCATE TABLE " + store.tableName); + await store.addDocuments(documents, { ids }); + }); + + test("similarity limit", async () => { + let results = await store.similaritySearch("hello", 10); + expect(results.length).toEqual(3); + expect(results[0].pageContent).toEqual("hello"); + expect(results[0].metadata.a).toEqual(2023); + + results = await store.similaritySearch("hello", 1); + expect(results.length).toEqual(1); + expect(results[0].pageContent).toEqual("hello"); + expect(results[0].metadata.a).toEqual(2023); + }); + + test("similarity with filter", async () => { + let results = await store.similaritySearch("hi", 10, { a: 2025 }); + expect(results.length).toEqual(2); + expect(results[0].pageContent).toEqual("hi"); + expect(results[0].metadata.a).toEqual(2025); + + results = await store.similaritySearch("hi", 10, { + a: { $gte: 2025 }, + country: { $in: ["GE", "FR"] }, + }); + expect(results.length).toEqual(1); + expect(results[0].pageContent).toEqual("hi"); + expect(results[0].metadata.a).toEqual(2025); + }); + + test("deletion with filter", async () => { + try { + await store.delete({}); + throw new Error("expected to fails"); + } catch (e) { + expect((e as Error).message).toEqual( + "You must specify either ids or a filter when deleting documents." + ); + } + + await store.delete({ filter: { a: 2023 } }); + let res = await store.pool.query( + "SELECT COUNT(*) as a FROM " + store.tableName + ); + expect(res[0][0]).toEqual(2n); + + await store.delete({ ids: ["63ae8c92-799a-11ef-98b2-f859713e4be4"] }); + res = await store.pool.query("SELECT COUNT(*) FROM " + store.tableName); + expect(res[0][0]).toEqual(1n); + }); + }); +}); diff --git a/yarn.lock b/yarn.lock index 8d08f02c4096..686a063e20dd 100644 --- a/yarn.lock +++ b/yarn.lock @@ -5145,6 +5145,13 @@ __metadata: languageName: node linkType: hard +"@balena/dockerignore@npm:^1.0.2": + version: 1.0.2 + resolution: "@balena/dockerignore@npm:1.0.2" + checksum: 0d39f8fbcfd1a983a44bced54508471ab81aaaa40e2c62b46a9f97eac9d6b265790799f16919216db486331dedaacdde6ecbd6b7abe285d39bc50de111991699 + languageName: node + linkType: hard + "@bcherny/json-schema-ref-parser@npm:10.0.5-fork": version: 10.0.5-fork resolution: "@bcherny/json-schema-ref-parser@npm:10.0.5-fork" @@ -8482,6 +8489,7 @@ __metadata: "@tensorflow/tfjs-backend-cpu": ^3 "@tensorflow/tfjs-converter": ^3.6.0 "@tensorflow/tfjs-core": ^3.6.0 + "@testcontainers/mariadb": ^10.16.0 "@tsconfig/recommended": ^1.0.2 "@types/better-sqlite3": ^7.6.10 "@types/crypto-js": ^4.2.2 @@ -8564,6 +8572,7 @@ __metadata: lodash: ^4.17.21 lunary: ^0.7.10 mammoth: ^1.6.0 + mariadb: ^3.4.0 mongodb: ^5.2.0 mysql2: ^3.9.8 neo4j-driver: ^5.17.0 @@ -8698,6 +8707,7 @@ __metadata: lodash: ^4.17.21 lunary: ^0.7.10 mammoth: ^1.6.0 + mariadb: ^3.4.0 mongodb: ">=5.2.0" mysql2: ^3.9.8 neo4j-driver: "*" @@ -8916,6 +8926,8 @@ __metadata: optional: true mammoth: optional: true + mariadb: + optional: true mongodb: optional: true mysql2: @@ -13728,6 +13740,15 @@ __metadata: languageName: node linkType: hard +"@testcontainers/mariadb@npm:^10.16.0": + version: 10.18.0 + resolution: "@testcontainers/mariadb@npm:10.18.0" + dependencies: + testcontainers: ^10.18.0 + checksum: 16bce7564fac58fa23bb9a40b03e230fe66d4b07b87c40742a74dc29d1e79d754c48494844baa7b565d14bc627c503a0225da39a88ac4b24cfec788252bca5c0 + languageName: node + linkType: hard + "@tinyhttp/content-disposition@npm:^2.2.0": version: 2.2.2 resolution: "@tinyhttp/content-disposition@npm:2.2.2" @@ -13970,6 +13991,27 @@ __metadata: languageName: node linkType: hard +"@types/docker-modem@npm:*": + version: 3.0.6 + resolution: "@types/docker-modem@npm:3.0.6" + dependencies: + "@types/node": "*" + "@types/ssh2": "*" + checksum: cc58e8189f6ec5a2b8ca890207402178a97ddac8c80d125dc65d8ab29034b5db736de15e99b91b2d74e66d14e26e73b6b8b33216613dd15fd3aa6b82c11a83ed + languageName: node + linkType: hard + +"@types/dockerode@npm:^3.3.29": + version: 3.3.35 + resolution: "@types/dockerode@npm:3.3.35" + dependencies: + "@types/docker-modem": "*" + "@types/node": "*" + "@types/ssh2": "*" + checksum: a59b7637de3a572bf4d41dc0cba3be75f14a734ae6e5f79c7f17dbfd1f100e507008f5a6b6aac657ccd13e67dd5d2cf6ee4608ded5ce97021737125d110a1d3b + languageName: node + linkType: hard + "@types/dompurify@npm:^3.0.5": version: 3.0.5 resolution: "@types/dompurify@npm:3.0.5" @@ -14084,6 +14126,13 @@ __metadata: languageName: node linkType: hard +"@types/geojson@npm:^7946.0.14": + version: 7946.0.16 + resolution: "@types/geojson@npm:7946.0.16" + checksum: d66e5e023f43b3e7121448117af1930af7d06410a32a585a8bc9c6bb5d97e0d656cd93d99e31fa432976c32e98d4b780f82bf1fd1acd20ccf952eb6b8e39edf2 + languageName: node + linkType: hard + "@types/glob@npm:^7.1.3": version: 7.2.0 resolution: "@types/glob@npm:7.2.0" @@ -14497,6 +14546,15 @@ __metadata: languageName: node linkType: hard +"@types/node@npm:^22.5.4": + version: 22.13.5 + resolution: "@types/node@npm:22.13.5" + dependencies: + undici-types: ~6.20.0 + checksum: 8789d9bc3efd212819fd03f7bbd429901b076703e9852ccf4950c8c7cd300d5d5a05f273d0936cbaf28194485d2bd0c265a1a25390720e353a53359526c28fb3 + languageName: node + linkType: hard + "@types/node@npm:~10.14.19": version: 10.14.22 resolution: "@types/node@npm:10.14.22" @@ -14833,6 +14891,34 @@ __metadata: languageName: node linkType: hard +"@types/ssh2-streams@npm:*": + version: 0.1.12 + resolution: "@types/ssh2-streams@npm:0.1.12" + dependencies: + "@types/node": "*" + checksum: aa0aa45e40cfca34b4443dafa8d28ff49196c05c71867cbf0a8cdd5127be4d8a3840819543fcad16535653ca8b0e29217671ed6500ff1e7a3ad2442c5d1b40a6 + languageName: node + linkType: hard + +"@types/ssh2@npm:*": + version: 1.15.4 + resolution: "@types/ssh2@npm:1.15.4" + dependencies: + "@types/node": ^18.11.18 + checksum: 1b748e1a5fdaf06557d183b8d19df4449b5b25cc18930aff426402e82be816717a099f7a43e104b177144b20c2e22da2bba3ba716d64c53cd54be33187cf85a1 + languageName: node + linkType: hard + +"@types/ssh2@npm:^0.5.48": + version: 0.5.52 + resolution: "@types/ssh2@npm:0.5.52" + dependencies: + "@types/node": "*" + "@types/ssh2-streams": "*" + checksum: bc1c76ac727ad73ddd59ba849cf0ea3ed2e930439e7a363aff24f04f29b74f9b1976369b869dc9a018223c9fb8ad041c09a0f07aea8cf46a8c920049188cddae + languageName: node + linkType: hard + "@types/stack-utils@npm:^2.0.0": version: 2.0.1 resolution: "@types/stack-utils@npm:2.0.1" @@ -16308,6 +16394,36 @@ __metadata: languageName: node linkType: hard +"archiver-utils@npm:^5.0.0, archiver-utils@npm:^5.0.2": + version: 5.0.2 + resolution: "archiver-utils@npm:5.0.2" + dependencies: + glob: ^10.0.0 + graceful-fs: ^4.2.0 + is-stream: ^2.0.1 + lazystream: ^1.0.0 + lodash: ^4.17.15 + normalize-path: ^3.0.0 + readable-stream: ^4.0.0 + checksum: 7dc4f3001dc373bd0fa7671ebf08edf6f815cbc539c78b5478a2eaa67e52e3fc0e92f562cdef2ba016c4dcb5468d3d069eb89535c6844da4a5bb0baf08ad5720 + languageName: node + linkType: hard + +"archiver@npm:^7.0.1": + version: 7.0.1 + resolution: "archiver@npm:7.0.1" + dependencies: + archiver-utils: ^5.0.2 + async: ^3.2.4 + buffer-crc32: ^1.0.0 + readable-stream: ^4.0.0 + readdir-glob: ^1.1.2 + tar-stream: ^3.0.0 + zip-stream: ^6.0.1 + checksum: f93bcc00f919e0bbb6bf38fddf111d6e4d1ed34721b73cc073edd37278303a7a9f67aa4abd6fd2beb80f6c88af77f2eb4f60276343f67605e3aea404e5ad93ea + languageName: node + linkType: hard + "are-we-there-yet@npm:^2.0.0": version: 2.0.0 resolution: "are-we-there-yet@npm:2.0.0" @@ -16584,6 +16700,15 @@ __metadata: languageName: node linkType: hard +"asn1@npm:^0.2.6": + version: 0.2.6 + resolution: "asn1@npm:0.2.6" + dependencies: + safer-buffer: ~2.1.0 + checksum: 39f2ae343b03c15ad4f238ba561e626602a3de8d94ae536c46a4a93e69578826305366dc09fbb9b56aec39b4982a463682f259c38e59f6fa380cd72cd61e493d + languageName: node + linkType: hard + "assemblyai@npm:^4.6.0": version: 4.6.0 resolution: "assemblyai@npm:4.6.0" @@ -16623,6 +16748,13 @@ __metadata: languageName: node linkType: hard +"async-lock@npm:^1.4.1": + version: 1.4.1 + resolution: "async-lock@npm:1.4.1" + checksum: 29e70cd892932b7c202437786cedc39ff62123cb6941014739bd3cabd6106326416e9e7c21285a5d1dc042cad239a0f7ec9c44658491ee4a615fd36a21c1d10a + languageName: node + linkType: hard + "async-mutex@npm:^0.5.0": version: 0.5.0 resolution: "async-mutex@npm:0.5.0" @@ -16648,6 +16780,13 @@ __metadata: languageName: node linkType: hard +"async@npm:^3.2.4": + version: 3.2.6 + resolution: "async@npm:3.2.6" + checksum: ee6eb8cd8a0ab1b58bd2a3ed6c415e93e773573a91d31df9d5ef559baafa9dab37d3b096fa7993e84585cac3697b2af6ddb9086f45d3ac8cae821bb2aab65682 + languageName: node + linkType: hard + "asynciterator.prototype@npm:^1.0.0": version: 1.0.0 resolution: "asynciterator.prototype@npm:1.0.0" @@ -17080,6 +17219,15 @@ __metadata: languageName: node linkType: hard +"bcrypt-pbkdf@npm:^1.0.2": + version: 1.0.2 + resolution: "bcrypt-pbkdf@npm:1.0.2" + dependencies: + tweetnacl: ^0.14.3 + checksum: 4edfc9fe7d07019609ccf797a2af28351736e9d012c8402a07120c4453a3b789a15f2ee1530dc49eee8f7eb9379331a8dd4b3766042b9e502f74a68e7f662291 + languageName: node + linkType: hard + "before-after-hook@npm:^2.2.0": version: 2.2.3 resolution: "before-after-hook@npm:2.2.3" @@ -17479,6 +17627,13 @@ __metadata: languageName: node linkType: hard +"buffer-crc32@npm:^1.0.0": + version: 1.0.0 + resolution: "buffer-crc32@npm:1.0.0" + checksum: bc114c0e02fe621249e0b5093c70e6f12d4c2b1d8ddaf3b1b7bbe3333466700100e6b1ebdc12c050d0db845bc582c4fce8c293da487cc483f97eea027c480b23 + languageName: node + linkType: hard + "buffer-crc32@npm:~0.2.3": version: 0.2.13 resolution: "buffer-crc32@npm:0.2.13" @@ -17544,6 +17699,13 @@ __metadata: languageName: node linkType: hard +"buildcheck@npm:~0.0.6": + version: 0.0.6 + resolution: "buildcheck@npm:0.0.6" + checksum: ad61759dc98d62e931df2c9f54ccac7b522e600c6e13bdcfdc2c9a872a818648c87765ee209c850f022174da4dd7c6a450c00357c5391705d26b9c5807c2a076 + languageName: node + linkType: hard + "builtins@npm:^5.0.0": version: 5.0.1 resolution: "builtins@npm:5.0.1" @@ -17580,6 +17742,13 @@ __metadata: languageName: node linkType: hard +"byline@npm:^5.0.0": + version: 5.0.0 + resolution: "byline@npm:5.0.0" + checksum: 737ca83e8eda2976728dae62e68bc733aea095fab08db4c6f12d3cee3cf45b6f97dce45d1f6b6ff9c2c947736d10074985b4425b31ce04afa1985a4ef3d334a7 + languageName: node + linkType: hard + "bytes@npm:3.0.0": version: 3.0.0 resolution: "bytes@npm:3.0.0" @@ -18679,6 +18848,19 @@ __metadata: languageName: node linkType: hard +"compress-commons@npm:^6.0.2": + version: 6.0.2 + resolution: "compress-commons@npm:6.0.2" + dependencies: + crc-32: ^1.2.0 + crc32-stream: ^6.0.0 + is-stream: ^2.0.1 + normalize-path: ^3.0.0 + readable-stream: ^4.0.0 + checksum: 37d79a54f91344ecde352588e0a128f28ce619b085acd4f887defd76978a0640e3454a42c7dcadb0191bb3f971724ae4b1f9d6ef9620034aa0427382099ac946 + languageName: node + linkType: hard + "compressible@npm:^2.0.12, compressible@npm:~2.0.16": version: 2.0.18 resolution: "compressible@npm:2.0.18" @@ -19103,6 +19285,36 @@ __metadata: languageName: node linkType: hard +"cpu-features@npm:~0.0.10": + version: 0.0.10 + resolution: "cpu-features@npm:0.0.10" + dependencies: + buildcheck: ~0.0.6 + nan: ^2.19.0 + node-gyp: latest + checksum: ab17e25cea0b642bdcfd163d3d872be4cc7d821e854d41048557799e990d672ee1cc7bd1d4e7c4de0309b1683d4c001d36ba8569b5035d1e7e2ff2d681f681d7 + languageName: node + linkType: hard + +"crc-32@npm:^1.2.0": + version: 1.2.2 + resolution: "crc-32@npm:1.2.2" + bin: + crc32: bin/crc32.njs + checksum: ad2d0ad0cbd465b75dcaeeff0600f8195b686816ab5f3ba4c6e052a07f728c3e70df2e3ca9fd3d4484dc4ba70586e161ca5a2334ec8bf5a41bf022a6103ff243 + languageName: node + linkType: hard + +"crc32-stream@npm:^6.0.0": + version: 6.0.0 + resolution: "crc32-stream@npm:6.0.0" + dependencies: + crc-32: ^1.2.0 + readable-stream: ^4.0.0 + checksum: e6edc2f81bc387daef6d18b2ac18c2ffcb01b554d3b5c7d8d29b177505aafffba574658fdd23922767e8dab1183d1962026c98c17e17fb272794c33293ef607c + languageName: node + linkType: hard + "create-jest@npm:^29.7.0": version: 29.7.0 resolution: "create-jest@npm:29.7.0" @@ -20548,6 +20760,38 @@ __metadata: languageName: node linkType: hard +"docker-compose@npm:^0.24.8": + version: 0.24.8 + resolution: "docker-compose@npm:0.24.8" + dependencies: + yaml: ^2.2.2 + checksum: 48f3564c46490f1f51899a144deb546b61450a76bffddb378379ac7702aa34b055e0237e0dc77507df94d7ad6f1f7daeeac27730230bce9aafe2e35efeda6b45 + languageName: node + linkType: hard + +"docker-modem@npm:^3.0.0": + version: 3.0.8 + resolution: "docker-modem@npm:3.0.8" + dependencies: + debug: ^4.1.1 + readable-stream: ^3.5.0 + split-ca: ^1.0.1 + ssh2: ^1.11.0 + checksum: e3675c9b1ad800be8fb1cb9c5621fbef20a75bfedcd6e01b69808eadd7f0165681e4e30d1700897b788a67dbf4769964fcccd19c3d66f6d2499bb7aede6b34df + languageName: node + linkType: hard + +"dockerode@npm:^3.3.5": + version: 3.3.5 + resolution: "dockerode@npm:3.3.5" + dependencies: + "@balena/dockerignore": ^1.0.2 + docker-modem: ^3.0.0 + tar-fs: ~2.0.1 + checksum: 7f6650422b07fa7ea9d5801f04b1a432634446b5fe37b995b8302b953b64e93abf1bb4596c2fb574ba47aafee685ef2ab959cc86c9654add5a26d09541bbbcc6 + languageName: node + linkType: hard + "doctrine@npm:^2.1.0": version: 2.1.0 resolution: "doctrine@npm:2.1.0" @@ -22571,6 +22815,7 @@ __metadata: js-yaml: ^4.1.0 langchain: "workspace:*" langsmith: ">=0.2.8 <0.4.0" + mariadb: ^3.4.0 mongodb: ^6.3.0 pg: ^8.11.0 pickleparser: ^0.2.1 @@ -23934,6 +24179,13 @@ __metadata: languageName: node linkType: hard +"get-port@npm:^5.1.1": + version: 5.1.1 + resolution: "get-port@npm:5.1.1" + checksum: 0162663ffe5c09e748cd79d97b74cd70e5a5c84b760a475ce5767b357fb2a57cb821cee412d646aa8a156ed39b78aab88974eddaa9e5ee926173c036c0713787 + languageName: node + linkType: hard + "get-stdin@npm:^8.0.0": version: 8.0.0 resolution: "get-stdin@npm:8.0.0" @@ -24147,6 +24399,22 @@ __metadata: languageName: node linkType: hard +"glob@npm:^10.0.0, glob@npm:^10.3.7": + version: 10.4.5 + resolution: "glob@npm:10.4.5" + dependencies: + foreground-child: ^3.1.0 + jackspeak: ^3.1.2 + minimatch: ^9.0.4 + minipass: ^7.1.2 + package-json-from-dist: ^1.0.0 + path-scurry: ^1.11.1 + bin: + glob: dist/esm/bin.mjs + checksum: 0bc725de5e4862f9f387fd0f2b274baf16850dcd2714502ccf471ee401803997983e2c05590cb65f9675a3c6f2a58e7a53f9e365704108c6ad3cbf1d60934c4a + languageName: node + linkType: hard + "glob@npm:^10.2.2": version: 10.3.12 resolution: "glob@npm:10.3.12" @@ -24192,22 +24460,6 @@ __metadata: languageName: node linkType: hard -"glob@npm:^10.3.7": - version: 10.4.5 - resolution: "glob@npm:10.4.5" - dependencies: - foreground-child: ^3.1.0 - jackspeak: ^3.1.2 - minimatch: ^9.0.4 - minipass: ^7.1.2 - package-json-from-dist: ^1.0.0 - path-scurry: ^1.11.1 - bin: - glob: dist/esm/bin.mjs - checksum: 0bc725de5e4862f9f387fd0f2b274baf16850dcd2714502ccf471ee401803997983e2c05590cb65f9675a3c6f2a58e7a53f9e365704108c6ad3cbf1d60934c4a - languageName: node - linkType: hard - "glob@npm:^7.0.0, glob@npm:^7.1.3, glob@npm:^7.1.4, glob@npm:^7.1.6": version: 7.2.3 resolution: "glob@npm:7.2.3" @@ -26316,7 +26568,7 @@ __metadata: languageName: node linkType: hard -"is-stream@npm:^2.0.0": +"is-stream@npm:^2.0.0, is-stream@npm:^2.0.1": version: 2.0.1 resolution: "is-stream@npm:2.0.1" checksum: b8e05ccdf96ac330ea83c12450304d4a591f9958c11fd17bed240af8d5ffe08aedafa4c0f4cfccd4d28dc9d4d129daca1023633d5c11601a6cbc77521f6fae66 @@ -28293,6 +28545,15 @@ __metadata: languageName: node linkType: hard +"lazystream@npm:^1.0.0": + version: 1.0.1 + resolution: "lazystream@npm:1.0.1" + dependencies: + readable-stream: ^2.0.5 + checksum: 822c54c6b87701a6491c70d4fabc4cafcf0f87d6b656af168ee7bb3c45de9128a801cb612e6eeeefc64d298a7524a698dd49b13b0121ae50c2ae305f0dcc5310 + languageName: node + linkType: hard + "leac@npm:^0.6.0": version: 0.6.0 resolution: "leac@npm:0.6.0" @@ -28710,7 +28971,7 @@ __metadata: languageName: node linkType: hard -"lodash@npm:4.17.21, lodash@npm:^4.17.19, lodash@npm:^4.17.20, lodash@npm:^4.17.21": +"lodash@npm:4.17.21, lodash@npm:^4.17.15, lodash@npm:^4.17.19, lodash@npm:^4.17.20, lodash@npm:^4.17.21": version: 4.17.21 resolution: "lodash@npm:4.17.21" checksum: eb835a2e51d381e561e508ce932ea50a8e5a68f4ebdd771ea240d3048244a8d13658acbd502cd4829768c56f2e16bdd4340b9ea141297d472517b83868e677f7 @@ -28893,6 +29154,13 @@ __metadata: languageName: node linkType: hard +"lru-cache@npm:^10.3.0": + version: 10.4.3 + resolution: "lru-cache@npm:10.4.3" + checksum: 6476138d2125387a6d20f100608c2583d415a4f64a0fecf30c9e2dda976614f09cad4baa0842447bd37dd459a7bd27f57d9d8f8ce558805abd487c583f3d774a + languageName: node + linkType: hard + "lru-cache@npm:^5.1.1": version: 5.1.1 resolution: "lru-cache@npm:5.1.1" @@ -29114,6 +29382,19 @@ __metadata: languageName: node linkType: hard +"mariadb@npm:^3.4.0": + version: 3.4.0 + resolution: "mariadb@npm:3.4.0" + dependencies: + "@types/geojson": ^7946.0.14 + "@types/node": ^22.5.4 + denque: ^2.1.0 + iconv-lite: ^0.6.3 + lru-cache: ^10.3.0 + checksum: 89e27ae2911541fa8ff5e5dfb20d5c4dd47005323027bf6bf2975a0710a2d4cde28e20cef4d7825058411ad673c9de1f64bc1f409f6b0e92237ddb0a0cc9d46f + languageName: node + linkType: hard + "markdown-escapes@npm:^1.0.0": version: 1.0.4 resolution: "markdown-escapes@npm:1.0.4" @@ -29462,7 +29743,7 @@ __metadata: languageName: node linkType: hard -"minimatch@npm:^5.0.1": +"minimatch@npm:^5.0.1, minimatch@npm:^5.1.0": version: 5.1.6 resolution: "minimatch@npm:5.1.6" dependencies: @@ -29947,6 +30228,15 @@ __metadata: languageName: node linkType: hard +"nan@npm:^2.19.0, nan@npm:^2.20.0": + version: 2.22.1 + resolution: "nan@npm:2.22.1" + dependencies: + node-gyp: latest + checksum: 984c07db9f94b7faf19c643d20a7bf66bb311e4c21d62204bb3a09310081550773441bacc928427c2438b9169e3dbd671cf0c8f1765eb4d87c5bd556d38af76b + languageName: node + linkType: hard + "nanoid@npm:^3.3.6": version: 3.3.6 resolution: "nanoid@npm:3.3.6" @@ -32964,6 +33254,15 @@ __metadata: languageName: node linkType: hard +"properties-reader@npm:^2.3.0": + version: 2.3.0 + resolution: "properties-reader@npm:2.3.0" + dependencies: + mkdirp: ^1.0.4 + checksum: cbf59e862dc507f8ce1f8d7641ed9737119f16a1d4dad8e79f17b303aaca1c6af7d36ddfef0f649cab4d200ba4334ac159af0b238f6978a085f5b1b5126b6cc3 + languageName: node + linkType: hard + "property-information@npm:^5.0.0, property-information@npm:^5.3.0": version: 5.6.0 resolution: "property-information@npm:5.6.0" @@ -33592,7 +33891,7 @@ __metadata: languageName: node linkType: hard -"readable-stream@npm:3, readable-stream@npm:^3.0.6": +"readable-stream@npm:3, readable-stream@npm:^3.0.6, readable-stream@npm:^3.5.0": version: 3.6.2 resolution: "readable-stream@npm:3.6.2" dependencies: @@ -33616,7 +33915,7 @@ __metadata: languageName: node linkType: hard -"readable-stream@npm:^2.0.0, readable-stream@npm:^2.0.1, readable-stream@npm:^2.3.0, readable-stream@npm:^2.3.5, readable-stream@npm:~2.3.6": +"readable-stream@npm:^2.0.0, readable-stream@npm:^2.0.1, readable-stream@npm:^2.0.5, readable-stream@npm:^2.3.0, readable-stream@npm:^2.3.5, readable-stream@npm:~2.3.6": version: 2.3.8 resolution: "readable-stream@npm:2.3.8" dependencies: @@ -33642,6 +33941,19 @@ __metadata: languageName: node linkType: hard +"readable-stream@npm:^4.0.0": + version: 4.7.0 + resolution: "readable-stream@npm:4.7.0" + dependencies: + abort-controller: ^3.0.0 + buffer: ^6.0.3 + events: ^3.3.0 + process: ^0.11.10 + string_decoder: ^1.3.0 + checksum: 03ec762faed8e149dc6452798b60394a8650861a1bb4bf936fa07b94044826bc25abe73696f5f45372abc404eec01876c560f64b479eba108b56397312dbe2ae + languageName: node + linkType: hard + "readable-web-to-node-stream@npm:^3.0.0": version: 3.0.2 resolution: "readable-web-to-node-stream@npm:3.0.2" @@ -33651,6 +33963,15 @@ __metadata: languageName: node linkType: hard +"readdir-glob@npm:^1.1.2": + version: 1.1.3 + resolution: "readdir-glob@npm:1.1.3" + dependencies: + minimatch: ^5.1.0 + checksum: 1dc0f7440ff5d9378b593abe9d42f34ebaf387516615e98ab410cf3a68f840abbf9ff1032d15e0a0dbffa78f9e2c46d4fafdbaac1ca435af2efe3264e3f21874 + languageName: node + linkType: hard + "readdirp@npm:~3.6.0": version: 3.6.0 resolution: "readdirp@npm:3.6.0" @@ -34647,7 +34968,7 @@ __metadata: languageName: node linkType: hard -"safer-buffer@npm:>= 2.1.2 < 3, safer-buffer@npm:>= 2.1.2 < 3.0.0": +"safer-buffer@npm:>= 2.1.2 < 3, safer-buffer@npm:>= 2.1.2 < 3.0.0, safer-buffer@npm:~2.1.0": version: 2.1.2 resolution: "safer-buffer@npm:2.1.2" checksum: cab8f25ae6f1434abee8d80023d7e72b598cf1327164ddab31003c51215526801e40b66c5e65d658a0af1e9d6478cadcb4c745f4bd6751f97d8644786c0978b0 @@ -35573,6 +35894,13 @@ __metadata: languageName: node linkType: hard +"split-ca@npm:^1.0.1": + version: 1.0.1 + resolution: "split-ca@npm:1.0.1" + checksum: 1e7409938a95ee843fe2593156a5735e6ee63772748ee448ea8477a5a3e3abde193c3325b3696e56a5aff07c7dcf6b1f6a2f2a036895b4f3afe96abb366d893f + languageName: node + linkType: hard + "split2@npm:^4.1.0": version: 4.2.0 resolution: "split2@npm:4.2.0" @@ -35630,6 +35958,33 @@ __metadata: languageName: node linkType: hard +"ssh-remote-port-forward@npm:^1.0.4": + version: 1.0.4 + resolution: "ssh-remote-port-forward@npm:1.0.4" + dependencies: + "@types/ssh2": ^0.5.48 + ssh2: ^1.4.0 + checksum: c6c04c5ddfde7cb06e9a8655a152bd28fe6771c6fe62ff0bc08be229491546c410f30b153c968b8d6817a57d38678a270c228f30143ec0fe1be546efc4f6b65a + languageName: node + linkType: hard + +"ssh2@npm:^1.11.0, ssh2@npm:^1.4.0": + version: 1.16.0 + resolution: "ssh2@npm:1.16.0" + dependencies: + asn1: ^0.2.6 + bcrypt-pbkdf: ^1.0.2 + cpu-features: ~0.0.10 + nan: ^2.20.0 + dependenciesMeta: + cpu-features: + optional: true + nan: + optional: true + checksum: c024c4a432aae2457852037f31c0d9bec323fb062ace3a31e4a6dd6c55842246c80e7d20ff93ffed22dde1e523250d8438bc2f7d4a1450cf4fa4887818176f0e + languageName: node + linkType: hard + "ssri@npm:^10.0.0": version: 10.0.5 resolution: "ssri@npm:10.0.5" @@ -36361,6 +36716,18 @@ __metadata: languageName: node linkType: hard +"tar-fs@npm:~2.0.1": + version: 2.0.1 + resolution: "tar-fs@npm:2.0.1" + dependencies: + chownr: ^1.1.1 + mkdirp-classic: ^0.5.2 + pump: ^3.0.0 + tar-stream: ^2.0.0 + checksum: 26cd297ed2421bc8038ce1a4ca442296b53739f409847d495d46086e5713d8db27f2c03ba2f461d0f5ddbc790045628188a8544f8ae32cbb6238b279b68d0247 + languageName: node + linkType: hard + "tar-stream@npm:^1.5.2": version: 1.6.2 resolution: "tar-stream@npm:1.6.2" @@ -36376,7 +36743,7 @@ __metadata: languageName: node linkType: hard -"tar-stream@npm:^2.1.4": +"tar-stream@npm:^2.0.0, tar-stream@npm:^2.1.4": version: 2.2.0 resolution: "tar-stream@npm:2.2.0" dependencies: @@ -36389,6 +36756,17 @@ __metadata: languageName: node linkType: hard +"tar-stream@npm:^3.0.0": + version: 3.1.7 + resolution: "tar-stream@npm:3.1.7" + dependencies: + b4a: ^1.6.4 + fast-fifo: ^1.2.0 + streamx: ^2.15.0 + checksum: 6393a6c19082b17b8dcc8e7fd349352bb29b4b8bfe1075912b91b01743ba6bb4298f5ff0b499a3bbaf82121830e96a1a59d4f21a43c0df339e54b01789cb8cc6 + languageName: node + linkType: hard + "tar-stream@npm:^3.1.5, tar-stream@npm:^3.1.6": version: 3.1.6 resolution: "tar-stream@npm:3.1.6" @@ -36516,6 +36894,29 @@ __metadata: languageName: node linkType: hard +"testcontainers@npm:^10.18.0": + version: 10.18.0 + resolution: "testcontainers@npm:10.18.0" + dependencies: + "@balena/dockerignore": ^1.0.2 + "@types/dockerode": ^3.3.29 + archiver: ^7.0.1 + async-lock: ^1.4.1 + byline: ^5.0.0 + debug: ^4.3.5 + docker-compose: ^0.24.8 + dockerode: ^3.3.5 + get-port: ^5.1.1 + proper-lockfile: ^4.1.2 + properties-reader: ^2.3.0 + ssh-remote-port-forward: ^1.0.4 + tar-fs: ^3.0.6 + tmp: ^0.2.3 + undici: ^5.28.5 + checksum: 41f27a01ac7d0e639d2fd026df49ffc8ad74da2ebed5f87644c7b29ecd2c383537ec13c8b5143f09c64c904741131be56c195595e32e2b463763c279fa377020 + languageName: node + linkType: hard + "text-decoder@npm:^1.1.0": version: 1.1.1 resolution: "text-decoder@npm:1.1.1" @@ -36637,6 +37038,13 @@ __metadata: languageName: node linkType: hard +"tmp@npm:^0.2.3": + version: 0.2.3 + resolution: "tmp@npm:0.2.3" + checksum: 73b5c96b6e52da7e104d9d44afb5d106bb1e16d9fa7d00dbeb9e6522e61b571fbdb165c756c62164be9a3bbe192b9b268c236d370a2a0955c7689cd2ae377b95 + languageName: node + linkType: hard + "tmpl@npm:1.0.5": version: 1.0.5 resolution: "tmpl@npm:1.0.5" @@ -37065,6 +37473,13 @@ __metadata: languageName: node linkType: hard +"tweetnacl@npm:^0.14.3": + version: 0.14.5 + resolution: "tweetnacl@npm:0.14.5" + checksum: 6061daba1724f59473d99a7bb82e13f211cdf6e31315510ae9656fefd4779851cb927adad90f3b488c8ed77c106adc0421ea8055f6f976ff21b27c5c4e918487 + languageName: node + linkType: hard + "type-check@npm:^0.4.0, type-check@npm:~0.4.0": version: 0.4.0 resolution: "type-check@npm:0.4.0" @@ -37671,6 +38086,13 @@ __metadata: languageName: node linkType: hard +"undici-types@npm:~6.20.0": + version: 6.20.0 + resolution: "undici-types@npm:6.20.0" + checksum: b7bc50f012dc6afbcce56c9fd62d7e86b20a62ff21f12b7b5cbf1973b9578d90f22a9c7fe50e638e96905d33893bf2f9f16d98929c4673c2480de05c6c96ea8b + languageName: node + linkType: hard + "undici@npm:5.27.2": version: 5.27.2 resolution: "undici@npm:5.27.2" @@ -37698,6 +38120,15 @@ __metadata: languageName: node linkType: hard +"undici@npm:^5.28.5": + version: 5.28.5 + resolution: "undici@npm:5.28.5" + dependencies: + "@fastify/busboy": ^2.0.0 + checksum: a402d699a602a8feee1c0f78267467c8ffcbd7682267fec7a1307fd11554a32976a2307bf1cc8bf6ef7a667654336592fbd66d675df20ce28357536fb55a3a7d + languageName: node + linkType: hard + "undici@npm:~5.28.4": version: 5.28.4 resolution: "undici@npm:5.28.4" @@ -39368,6 +39799,15 @@ __metadata: languageName: node linkType: hard +"yaml@npm:^2.2.2": + version: 2.7.0 + resolution: "yaml@npm:2.7.0" + bin: + yaml: bin.mjs + checksum: 6e8b2f9b9d1b18b10274d58eb3a47ec223d9a93245a890dcb34d62865f7e744747190a9b9177d5f0ef4ea2e44ad2c0214993deb42e0800766203ac46f00a12dd + languageName: node + linkType: hard + "yaml@npm:^2.4.5": version: 2.4.5 resolution: "yaml@npm:2.4.5" @@ -39479,6 +39919,17 @@ __metadata: languageName: node linkType: hard +"zip-stream@npm:^6.0.1": + version: 6.0.1 + resolution: "zip-stream@npm:6.0.1" + dependencies: + archiver-utils: ^5.0.0 + compress-commons: ^6.0.2 + readable-stream: ^4.0.0 + checksum: aa5abd6a89590eadeba040afbc375f53337f12637e5e98330012a12d9886cde7a3ccc28bd91aafab50576035bbb1de39a9a316eecf2411c8b9009c9f94f0db27 + languageName: node + linkType: hard + "zod-to-json-schema@npm:3.20.3": version: 3.20.3 resolution: "zod-to-json-schema@npm:3.20.3"