[Feature] Phi-4-MM support #6544

lifuhuang · 2025-05-23T04:17:59Z

Update

Currently the basic text + vision support is already in main. However, there are many known issues across the board.

Known limitations: (See Execution Plan before for full list):

Audio capabilities: currently we do not support audio at all.
LoRA / Image quality: Phi4MM depends on LoRA for full image capability, but there is some compatibility issues with the native SGL LORA solution. We are working on solving it by refactoring / generalizing SGL LoRA capabilities. Fixed with Refactor LoRA handling to support adapter tensors in fused format #6585, Fix incorrect LoRA weight loading for fused gate_up_proj #6734, Support LoRA in TestOpenAIVisionServer and fix fused kv_proj loading bug. #6861)
Token: Phi4MM supports two types of image token conventions (<|image1|> and <|endoftext10|>), currently we only support the latter. If you use the default chat template, it will automatically pick up the supported one.

Motivation

Supporting the Phi4 Multimodal model (https://huggingface.co/microsoft/Phi-4-multimodal-instruct in SGL.

Execution Plan:

Basic text + image support (@lifuhuang Support Phi-4 Multi-Modal (text + vision only) #6494)
LoRA support (required for full image understanding capability): (@lifuhuang Refactor LoRA handling to support adapter tensors in fused format #6585, Fix incorrect LoRA weight loading for fused gate_up_proj #6734, Support LoRA in TestOpenAIVisionServer and fix fused kv_proj loading bug. #6861)
perf optimization
Precomputed feature support.
Refactor SGL MM processor logic support for support the original token variable image token (e.g., <image_1>)
SGLang LoRA compatibility with Radix Attention
audio support

Related resources

No response

The text was updated successfully, but these errors were encountered:

lifuhuang added help wanted Extra attention is needed high priority microsoft labels May 23, 2025

lifuhuang assigned lifuhuang, zhyncs and hebiao064 May 23, 2025

This was referenced May 23, 2025

[Feature] microsoft/Phi-4-multimodal-instruct #5972

Closed

Support Phi-4 Multi-Modal (text + vision only) #6494

Merged

Fridge003 mentioned this issue Jun 8, 2025

[Feature] Lora Development Roadmap #2929

Open

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Phi-4-MM support #6544

[Feature] Phi-4-MM support #6544

lifuhuang commented May 23, 2025 •

edited

Loading

[Feature] Phi-4-MM support #6544

[Feature] Phi-4-MM support #6544

Comments

lifuhuang commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update

Motivation

Related resources

lifuhuang commented May 23, 2025 •

edited

Loading