Skip to content

koaning/simsity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

landing

simsity

Simsity is a Super Simple Similarities Service[tm].
It's all about building a neighborhood. Literally!

This simple library is partially inspired by this blogpost by Max Woolfe. You don't always need a full fledged vector database. Polars and numpy might be plenty! And for those moments, simsity is all you need.

Install

You can install simsity via pip.

uv pip install simsity

The goal of simsity is to be minimal, to make rapid prototyping very easy and to be "just enough" for medium sized datasets. You will mainly interact with these two functions.

from simsity import create_index, load_index

As their names imply, you can use these to create an index or to load one from disk.

Quickstart

from simsity import create_index, load_index
from simsity.datasets import fetch_recipes

# Let's fetch some demo data
recipes = fetch_recipes()["text"].to_list()

# Let's use model2vec for embeddings 
from model2vec import StaticModel
model = StaticModel.from_pretrained("minishlab/potion-base-8M")

# Populate the ANN vector index and use it. 
index = create_index(recipes, model.encode)
texts, dists = index.query("pork")

# You can also query using vectors
v_pork = model.encode(["pork"])[0]
texts, dists = index.query_vector(v_pork)

You can also provide a path and then you'll be able to store/load everything.

# Make an index with a path
index = create_index(recipes, encoder, path="demo")

# Load an index from a path
reloaded_index = load_index(path="demo", encoder=encoder)
texts, dists = reloaded_index.query("pork")

That's it! Happy hacking!