Simsity is a Super Simple Similarities Service[tm].
It's all about building a neighborhood. Literally!
This simple library is partially inspired by this blogpost by Max Woolfe. You don't always need a full fledged vector database. Polars and numpy might be plenty! And for those moments, simsity
is all you need.
You can install simsity via pip.
uv pip install simsity
The goal of simsity is to be minimal, to make rapid prototyping very easy and to be "just enough" for medium sized datasets. You will mainly interact with these two functions.
from simsity import create_index, load_index
As their names imply, you can use these to create an index or to load one from disk.
from simsity import create_index, load_index
from simsity.datasets import fetch_recipes
# Let's fetch some demo data
recipes = fetch_recipes()["text"].to_list()
# Let's use model2vec for embeddings
from model2vec import StaticModel
model = StaticModel.from_pretrained("minishlab/potion-base-8M")
# Populate the ANN vector index and use it.
index = create_index(recipes, model.encode)
texts, dists = index.query("pork")
# You can also query using vectors
v_pork = model.encode(["pork"])[0]
texts, dists = index.query_vector(v_pork)
You can also provide a path and then you'll be able to store/load everything.
# Make an index with a path
index = create_index(recipes, encoder, path="demo")
# Load an index from a path
reloaded_index = load_index(path="demo", encoder=encoder)
texts, dists = reloaded_index.query("pork")
That's it! Happy hacking!