[MODEL REQUEST] requesting new model (Qwen3 Series (32B → 4B) for NPU-Optimized Inference with Tools/Function Calling & OpenAI API Compatibility on QAI-Hub)

**Is your feature request related to a problem? Please describe.**
The current catalog of models on QAI-Hub optimized for NPU (Neural Processing Unit) acceleration lacks advanced, open-source models capable of powering tool-augmented, agentic, and function-calling workflows directly on device.

The latest Qwen3 series offers groundbreaking advancements in reasoning, code generation, multilingual understanding, and dynamic tool use, making them ideal candidates for edge AI scenarios where NPU inference is required.

Request: Add the Qwen3 models from largest to smallest prioritized for NPU-optimized deployment to enable cutting-edge, on-device intelligent agents.

**Details of models being requested (Ordered by Priority for NPU Deployment):**

🔺 Highest Priority

Model Name: Qwen3-32B

Type: Dense

Source repo link: https://github.com/QwenLM/Qwen3

Use Case: NPU-accelerated intelligent agent with dynamic tool orchestration, large-scale multi-turn reasoning, multilingual interaction.

🔺 High Priority

Model Name: Qwen3-30B-A3B (MoE with 3B active parameters)

Type: Mixture of Experts

Source repo link: https://github.com/QwenLM/Qwen3

Use Case: Memory-efficient model excellent for constrained NPU memory, still achieving strong performance in reasoning and external tool use.

⚪ Medium Priority

Model Name: Qwen3-14B

Type: Dense

Source repo link: https://github.com/QwenLM/Qwen3

Use Case: Mid-size agentic assistant for mobile/edge deployment with solid multi-step reasoning and OpenAI API function-calling workflows.

⚪ Medium Priority

Model Name: Qwen3-8B

Type: Dense

Source repo link: https://github.com/QwenLM/Qwen3

Use Case: Mobile-friendly assistant model for real-time reasoning and tool-augmented coding tasks, optimized for limited NPU memory.

🔻 Lower Priority

Model Name: Qwen3-4B

Type: Dense

Source repo link: https://github.com/QwenLM/Qwen3

Use Case: Lightweight fallback for very constrained NPU setups while retaining essential tool-calling and multilingual capabilities.

**Additional Context for Requested Models:**

- Native dynamic thinking and non-thinking modes to optimize different reasoning workflows.
- Fully compatible with OpenAI API standards including /v1/chat/completions and function/tool schemas.
- Designed for dynamic external tool invocation, multi-turn dialogues, and intelligent agent workflows.
- Quantized versions (such as Q4_K_M) may be required to maximize NPU compatibility without sacrificing functionality.

**Key Requirements:**

- Full NPU Optimization for all models (quantization, kernel acceleration).
- Support OpenAI API endpoints: /v1/chat/completions, /v1/models, /v1/completions.
- Ensure Tools/Function Calling Support: parsing of OpenAI tool schemas, dynamic invocation with arguments.
- Documentation Needed: Shows how to run thinking vs non-thinking mode, clearly outline quantization types used, memory usage, NPU compatibility constraints, and any performance trade-offs.

**References:**

- 🦙 [Ollama Qwen3 Library](https://ollama.com/library/qwen3)
- 🤗 [Hugging Face Qwen3 Collection](https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f)
- 📂 [Qwen3 Official GitHub Repository](https://github.com/QwenLM/Qwen3)
- 📰 [Qwen3 Official Blog Announcement](https://qwenlm.github.io/blog/qwen3/)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MODEL REQUEST] requesting new model (Qwen3 Series (32B → 4B) for NPU-Optimized Inference with Tools/Function Calling & OpenAI API Compatibility on QAI-Hub) #195

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[MODEL REQUEST] requesting new model (Qwen3 Series (32B → 4B) for NPU-Optimized Inference with Tools/Function Calling & OpenAI API Compatibility on QAI-Hub) #195

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions