Skip to content

Commit fff2665

Browse files
authored
Audio input support (Phi 4 multimodal) (#1448)
* Deps * Add conformer * Nemo loading * Position embeds * Load t5 attn bias * Attn and feed forward * Add conv module and glu pointwise * Implement relative attn bias * Add the forward methods * Add encoder embedding * Fix oproj * Some loading * Conformer loads! * Fully loading speech stack * Merger * Dont need that * First pass at audio processing * Read samples * Optional * Small loading fix * Runs but not correct yet * Improved audio processing? * Works with this * Fix t5 attn bias * It works! * Comment * Use some other crates * Clippy * Allow bf16 on metal * Add prefix_audio * Remove unused * Typo * User specified * Add audio url parsing * AudioProjectionMode -> InputMode * Audio prefix caching * Fix bug in audio prefix caching * Support both at the same time! * Tweak logging * Support stereo * Add mistralrs-audio * Support batching * Add server and rust api example * Add python api * Fix add_multimodal_message * Fix unfold for conformer * Streaming example * Add web chat support * Add modalities registry
1 parent 2cb0a3e commit fff2665

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+3850
-157
lines changed

.typos.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,9 @@ extend-ignore-identifiers-re = [
1010
"thr",
1111
"nd",
1212
"uneeded",
13-
"tese"
13+
"tese",
14+
"seperable",
15+
"Seperable",
1416
]
1517

1618
[files]

Cargo.lock

Lines changed: 230 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ members = [
1010
"mistralrs-quant",
1111
"mistralrs-paged-attn",
1212
"mistralrs-web-chat",
13+
"mistralrs-audio",
1314
]
1415
resolver = "2"
1516

@@ -140,6 +141,7 @@ ahash = "0.8.12"
140141
num-traits = "0.2.19"
141142
libc = "0.2.172"
142143
bm25 = "2.2.1"
144+
symphonia = { version = "0.5.4", default-features = false, features = ["mp3", "flac", "vorbis", "wav", "isomp4", "ogg", "pcm"] }
143145
lazy_static = "1.5"
144146
paste = "1.0.15"
145147
byteorder = "1.5.0"
@@ -159,12 +161,18 @@ include_dir = "0.7.4"
159161
http = "1.3.1"
160162
hyper = "1.6.0"
161163
bindgen_cuda = { git = "https://github.com/guoqingbao/bindgen_cuda.git", version = "0.1.6" }
164+
rubato = "0.16.2"
165+
rustfft = "6.3.0"
166+
hound = "3.5.1"
167+
apodize = "1.0.0"
168+
162169
mistralrs-core = { path = "mistralrs-core" }
163170
mistralrs-paged-attn = { path = "mistralrs-paged-attn" }
164171
mistralrs-quant = { path = "mistralrs-quant" }
165172
mistralrs-vision = { path = "mistralrs-vision" }
166173
mistralrs-server-core = { path = "mistralrs-server-core" }
167174
mistralrs = { path = "mistralrs" }
175+
mistralrs-audio = { path = "mistralrs-audio" }
168176

169177
[profile.release-with-debug]
170178
inherits = "release"

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Blazingly fast LLM inference.
1111
| <a href="https://ericlbuehler.github.io/mistral.rs/mistralrs/"><b>Rust Documentation</b></a> | <a href="https://github.com/EricLBuehler/mistral.rs/blob/master/mistralrs-pyo3/API.md"><b>Python Documentation</b></a> | <a href="https://discord.gg/SZrecqK8qw"><b>Discord</b></a> | <a href="https://matrix.to/#/#mistral.rs:matrix.org"><b>Matrix</b></a> |
1212
</p>
1313

14-
Mistral.rs is a cross-platform, highly multimodal inference engine featuring support for **text**, **vision**, **image generation**, and **speech generation** models!
14+
Mistral.rs is a cross-platform, highly-multimodal inference engine that brings you local **text→text**, **text + vision→text**, **text + speech + vision→text**, **text→speech**, and **text→image** workflows — all in one blazing-fast package!
1515

1616
Please submit requests for new models [here](https://github.com/EricLBuehler/mistral.rs/issues/156).
1717

0 commit comments

Comments
 (0)