Skip to content

tenstorrent/tt-inference-server

Repository files navigation

TT-Inference-Server

Tenstorrent Inference Server (tt-inference-server) is the repo of available model APIs for deploying on Tenstorrent hardware.

Official Repository

https://github.com/tenstorrent/tt-inference-server

Getting Started

Please follow setup instructions for the model you want to serve, Model Name in tables below link to corresponding implementation.

Note: models with Status [🔍 preview] are under active development. If you encounter setup or stability problems please file an issue and our team will address it.

LLMs

For automated and pre-configured vLLM inference server using Docker please see the Model Readiness Workflows User Guide.

Model Name Model URL Hardware Status tt-metal commit vLLM commit Docker Image
QwQ-32B HF Repo TT-LoudBox/TT-QuietBox 🔍 preview v0.56.0-rc51 e2e0002a 0.0.4-v0.56.0-rc51-e2e0002ac7dc
DeepSeek-R1-Distill-Llama-70B HF Repo TT-LoudBox/TT-QuietBox 🔍 preview v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Qwen2.5-72B HF Repo TT-LoudBox/TT-QuietBox 🔍 preview v0.56.0-rc33 e2e0002a 0.0.4-v0.56.0-rc33-e2e0002ac7dc
Qwen2.5-72B-Instruct HF Repo TT-LoudBox/TT-QuietBox 🔍 preview v0.56.0-rc33 e2e0002a 0.0.4-v0.56.0-rc33-e2e0002ac7dc
Qwen2.5-7B HF Repo n150 🔍 preview v0.56.0-rc33 e2e0002a 0.0.4-v0.56.0-rc33-e2e0002ac7dc
Qwen2.5-7B-Instruct HF Repo n150 🔍 preview v0.56.0-rc33 e2e0002a 0.0.4-v0.56.0-rc33-e2e0002ac7dc
Llama-3.3-70B-Instruct HF Repo TT-LoudBox/TT-QuietBox ✅ ready v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.2-11B-Vision HF Repo n150 🔍 preview v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.2-11B-Vision-Instruct HF Repo n150 🔍 preview v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.2-1B HF Repo n150 ✅ ready v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.2-1B-Instruct HF Repo n150 ✅ ready v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.2-3B HF Repo n150 ✅ ready v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.2-3B-Instruct HF Repo n150 ✅ ready v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.1-70B HF Repo TT-LoudBox/TT-QuietBox ✅ ready v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.1-70B-Instruct HF Repo TT-LoudBox/TT-QuietBox ✅ ready v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.1-8B HF Repo n150 ✅ ready v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc
Llama-3.1-8B-Instruct HF Repo n150 ✅ ready v0.56.0-rc47 e2e0002a 0.0.4-v0.56.0-rc47-e2e0002ac7dc

CNNs

Model Name Model URL Hardware Status Minimum Release Version
YOLOv4 GH Repo n150 🔍 preview v0.0.1