Skip to content

use llguidance library for constraints (including json schemas) #899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Dec 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
7fa4686
import llguidance modules
mmoskal Nov 6, 2024
1dcca91
llg constraint types
mmoskal Nov 6, 2024
eaf52be
integrate llguidance
mmoskal Nov 6, 2024
cc82722
fix handling of stop tokens
mmoskal Nov 6, 2024
fb83b5c
update toktrie
mmoskal Nov 6, 2024
00c36dd
remove submodules
mmoskal Nov 6, 2024
cab564b
fix version conflicts
mmoskal Nov 6, 2024
d1becbd
tok_trie -> tok_env rename
mmoskal Nov 6, 2024
f70b3de
update to latest llguidance
mmoskal Nov 30, 2024
3d22f2b
Merge branch 'master' into llg_cleanup
mmoskal Nov 30, 2024
ca9e346
sync lock
mmoskal Nov 30, 2024
7b3ae50
bump llg (lazy_static fix)
mmoskal Nov 30, 2024
146bc4c
update to latest llguidance, fix conflicts
mmoskal Nov 30, 2024
6019fee
import toktrie via llguidance
mmoskal Nov 30, 2024
dd35965
n=1
mmoskal Dec 1, 2024
3ac55ed
test with llama1b
mmoskal Dec 1, 2024
8ca7514
remove aici folder (no longer used)
mmoskal Dec 1, 2024
7967d42
use more specific type for llg grammars
mmoskal Dec 1, 2024
9a919d8
update python APIs to support json schema and llg
mmoskal Dec 1, 2024
0ad0948
update example to use lark not yacc
mmoskal Dec 1, 2024
816ac8f
rename example
mmoskal Dec 1, 2024
5fc906b
remove testing scripts
mmoskal Dec 1, 2024
5e9cbd2
re-export llguidance for easier LlguidanceGrammar construction
mmoskal Dec 2, 2024
ffcdd2a
fix formatting
mmoskal Dec 2, 2024
fda20fe
fix clippy
mmoskal Dec 2, 2024
b5add20
Merge branch 'master' into llg_cleanup
mmoskal Dec 2, 2024
2c59224
add python samples
mmoskal Dec 2, 2024
976092c
add server samples
mmoskal Dec 2, 2024
ac7c35d
add rust samples
mmoskal Dec 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
166 changes: 162 additions & 4 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ Mistral.rs supports several model categories:
**Easy**:
- Lightweight OpenAI API compatible HTTP server
- Python API
- Grammar support with Regex and Yacc
- Grammar support with JSON Schema, Regex, Lark, and Guidance via [LLGuidance library](https://github.com/microsoft/llguidance)
- [ISQ](docs/ISQ.md) (In situ quantization): run `.safetensors` models directly from 🤗 Hugging Face by quantizing in-place
- Enhance performance with an [imatrix](docs/IMATRIX.md)!

Expand Down
2 changes: 1 addition & 1 deletion docs/HTTP.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ The API consists of the following endpoints. They can be viewed in your browser
To support additional features, we have extended the completion and chat completion request objects. Both have the same keys added:

- `top_k`: `int` | `null`. If non null, it is only relevant if positive.
- `grammar`: `{"type" : "regex" | "yacc", "value": string}` or `null`. Grammar to use.
- `grammar`: `{"type" : "regex" | "lark" | "json_schema" | "llguidance", "value": string}` or `null`. Grammar to use.
- `adapters`: `array of string` | `null`. Adapter names to activate for this request.
- `min_p`: `float` | `null`. If non null, it is only relevant if 1 >= min_p >= 0.

Expand Down
34 changes: 34 additions & 0 deletions examples/python/json_schema.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
from mistralrs import Runner, Which, ChatCompletionRequest, Architecture
from json import dumps

runner = Runner(
which=Which.Plain(
model_id="microsoft/Phi-3.5-mini-instruct",
),
num_device_layers=["500"],
)

res = runner.send_chat_completion_request(
ChatCompletionRequest(
model="phi",
messages=[{"role": "user", "content": "Give me a sample address."}],
max_tokens=256,
temperature=0.1,
grammar_type="json_schema",
grammar=dumps(
{
"type": "object",
"properties": {
"street": {"type": "string"},
"city": {"type": "string"},
"state": {"type": "string", "pattern": "^[A-Z]{2}$"},
"zip": {"type": "integer", "minimum": 10000, "maximum": 99999},
},
"required": ["street", "city", "state", "zip"],
"additionalProperties": False,
}
),
)
)
print(res.choices[0].message.content)
print(res.usage)
Loading
Loading