Documents and Data Search using Vector DB (Cloud Agnostic)

The repo demonstrates the deterministic solution for searching documents and data, it leverages the Gen AI technologies for assisting end users with their domain specific searches. It also highlights some of the ground realities and common assumptions in this space e.g. Large Language Models (LLMs) are absolutely required for RAG solutions. In RAG solutions, where business domain is well scoped to the localized search, LLMs are only used to convert the response to a more natural sounding language (worldly knowledge is not so much required here) whereas the actual search is provided by vector database. Use of LLMs often result in hallucinations and inaccurate results when used in a straight through processing, it isn't suitable for matters of consequence; use of SLMs could potentially be a better alternative here. This repo, enables the configurable combination of vector DB and S/LLMs to evaluate the optimal solution for the desired outcomes.

Primary Features

Search the pre-indexed documents using vector DB and respond in natural language using S/LLM models, optionally.
Search the databases (e.g. Influx, SQL) using predefined queries in vector DB or synthesized by S/LLM model, and respond in natural language using S/LLM models, optionally.

Design Overview and Use Cases

There two primary and basic use cases handled by this solution, they are described below along with their respective flows.

Document Search

User would like to search existing documents, these documents can be machine manuals in industrial domain or regulatory/compliance policies in financial domain.

Database Search

On database search side of things, this solution addresses the challenge faced by non IT users who'd benefit from data exploration apart from well thought-out and predefined queries written by IT upfront.

Deployment

This section describes the steps to deploy this solution in your environment.

Codespace Deployment

Open this codespace in your browser or in your local Visual Studio Code.
Install dependent services make setup
Run Document API:
1. Run document search service make run_doc
2. Open Swagger link http://localhost:5152/swagger/index.html if you are on VS Code.
3. Open Swagger link by appending /swagger/index.html to the hostname from the Ports tab if you are on Codespaces in a browser.
Run Data API:
1. Configure Influx DB as described here.
2. Run document search service make run_db
3. Open Swagger link http://localhost:5155/swagger/index.html if you are on VS Code.
4. Open Swagger link by appending /swagger/index.html to the hostname from the Ports tab if you are on Codespaces in a browser.

Local Deployment/Development on WSL/Linux

Clone repo git clone [email protected]:suneetnangia/rag-doc-data-search.git && cd rag-doc-data-search
Optionally, open the repo in a pre-configured Dev Container.
Install dependent services make setup
Run Document API:
1. Run document search service make run_doc
2. Open Swagger link http://localhost:5152/swagger/index.html to try the APIs.
Run Data API:
1. Configure Influx DB as described here.
2. Run document search service make run_db
3. Open Swagger link http://localhost:5155/swagger/index.html to try the APIs.

Configuration and Extensibility

This repo makes use of Ollama to host both embeddings models and S/LLM models. Ollama provides various options regarding hosting and management of models, we surface some of those options along with vector db options in this solution, they can be configured via appsettings.

Potential Extensions

These potential extensions can provide layers on top of this solution, to provide an on-ramp for various use cases.

CLI Repo: Provides access to the solution via CLI interface for scripting and automating.
Bootstrapping Repo: Loads sample data in the solution.
K8s Repo: Deploys the solution in K8s setting using sidecar pattern.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.devcontainer		.devcontainer
.vscode		.vscode
docs		docs
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
cspell.json		cspell.json
rag-doc-data-search.sln		rag-doc-data-search.sln

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Documents and Data Search using Vector DB (Cloud Agnostic)

Primary Features

Design Overview and Use Cases

Document Search

Database Search

Deployment

Codespace Deployment

Local Deployment/Development on WSL/Linux

Configuration and Extensibility

Potential Extensions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

suneetnangia/rag-doc-data-search

Folders and files

Latest commit

History

Repository files navigation

Documents and Data Search using Vector DB (Cloud Agnostic)

Primary Features

Design Overview and Use Cases

Document Search

Database Search

Deployment

Codespace Deployment

Local Deployment/Development on WSL/Linux

Configuration and Extensibility

Potential Extensions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages