Release kantv-1.6.10 · kantv-ai/kantv

Changes

enable Markdown rendering for LLM inference result(this feature is exactly same to webui in llama-server)

enable self-contained build with internal customized toolchains #325

add prebuilt-apk via Github-Action for Android phone equipped without Qualcomm mobile SoC

sync with upstream llama.cpp&whisper.cpp

ggml-hexagon: fix bug "LLM inference with cDSP cann't works" #330

realtime video recognition on Android phone, powered by MTMD in upstream llama.cpp #341

re-enable stable-diffusion

improve stability

refine docs(toplevel README.md and how-to-build.md and other docs)

enable release build after fix a long-term issue

All Features

TV playback
TV playback + TV recording
local playback with recorded video file
TV playback + AI-subtitle(powered by whisper.cpp)
TV playback + TV recording + AI-subtitle
ASR inference(through internal customized whisper.cpp) benchmark with ggml backend & Hexagon-cDSP backend
LLM inference(through internal customized llama.cpp) benchmark with ggml backend & Hexagon-cDSP backend
multi-modal(image-2-text) LLM inference(through internal customized llama.cpp) benchmark with ggml backend & Hexagon-cDSP backend
text-2-image(through internal customized stable-diffusion.cpp) inference with ggml backend
2D graphic benchmark
video encode benchmark
video encode benchmark and create code-generated video file
local playback with code-generated video file
download LLM model(in local dev envs)
edit tv.xml(aka customize tv.xml for personal need or R&D activity)
realtime video recognition via MTMD in upstream llama.cpp + SmolVLM2-256M with the default ggml backend

Prebuilt Android APK for Qualcomm Snapdragon based phone and non qcom based phone

download prebuilt Android APK from
https://github.com/kantv-ai/kantv/actions/runs/15226969178
unzip the downloaded file
install the corresponding APK on the Android phone equipped with / without Qualcomm mobile SoC

Dev envs

internal customized/tailored QNN SDK and Hexagon SDK, official Android NDK, all of these can be automatically downloadded and setup by a hand-written script in this project, details can be seen in how-to-build.md.

official Android Studio Jellyfish (| 2023.3.1 April 30, 2024), Android Studio can be skipped for AI researchers/experts(vim + vscode is more lightweight tools), details can be seen in how-to-build.md.

QNN SDK is v2.34.0.250424,Hexagon SDK is v6.2.0.1, Android NDK is android-ndk-r28.

Running on Android phone

Android 7.0(2016.08) --- Android 15(2024.10) and higher version with ANY mainstream arm64 mobile SoC should be supported, issue reports are welcomed.
Android smartphone equipped with one of below Qualcomm mobile SoCs(Qualcomm Snapdragon 8Gen3 and 8Elite are highly recommended) is required for verify/running ggml-hexagon backend on Android phone:

Snapdragon 8 Gen 1
Snapdragon 8 Gen 1+
Snapdragon 8 Gen 2
Snapdragon 8 Gen 3(verified)
Snapdragon 8 Elite(verified)

Android smartphone equipped with ANY mainstream high-end mobile SoC is highly recommended for realtime AI-subtitle feature otherwise unexpected behavior would happen

AI models

default ASR model is the built-in whisper model gml-tiny.en-q8_0.bin, built-in in the APK.
default LLM model is Gemma3-4B, best overall performance on my Snapdragon 8Gen3 phone and 8Elite phone.
default Multimodal model is SmolVLM2-256M
default Text2Image model is sd-v1.4

verified LLM models: Qwen1.5-1.8B,Qwen2.5-3b,Qwen3-4B,Qwen3-8B,Qwen3-14B,Gemma3-4B,Gemma3-12B
verified Multimodal models:SmolVLM2-256M,SmolVLM-500M
verified Text2Image models: sd-v1.4

all these AI models can be downloaded manually in "LLM Setting"(directly access to https://huggingface.co/ is required for download AI models in the APK):

Todo

an automated CT approach should be introduced in this project for validate every PR/release. whether AI can be used for this purpose?
solve the deadlock in ndkcamera:nihui/ncnn-android-scrfd#16
remove redundant and unnecessary permissions(the permissions of send and receive SMS are actually not used/needed in the APK) of APK to strictly follow the principle of "minimum permissions' and EU's GDPR.
sync with upstream llama.cpp&whisper.cpp
integrate the new MTMD(audio supportive)in the upstream llama.cpp
implement a specified feature(download AI models in the APK directly) for users/developers from China.
others can be found at roadmap

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kantv-1.6.10