Changes
All Features
- TV playback
- TV playback + TV recording
- local playback with recorded video file
- TV playback + AI-subtitle(powered by whisper.cpp)
- TV playback + TV recording + AI-subtitle
- ASR inference(through internal customized whisper.cpp) benchmark with ggml backend & Hexagon-cDSP backend
- LLM inference(through internal customized llama.cpp) benchmark with ggml backend & Hexagon-cDSP backend
- multi-modal(image-2-text) LLM inference(through internal customized llama.cpp) benchmark with ggml backend & Hexagon-cDSP backend
- text-2-image(through internal customized stable-diffusion.cpp) inference with ggml backend
- 2D graphic benchmark
- video encode benchmark
- video encode benchmark and create code-generated video file
- local playback with code-generated video file
- download LLM model(in local dev envs)
- edit tv.xml(aka customize tv.xml for personal need or R&D activity)
- realtime video recognition via MTMD in upstream llama.cpp + SmolVLM2-256M with the default ggml backend
Prebuilt Android APK for Qualcomm Snapdragon based phone and non qcom based phone
-
download prebuilt Android APK from
https://github.com/kantv-ai/kantv/actions/runs/15226969178 -
unzip the downloaded file
-
install the corresponding APK on the Android phone equipped with / without Qualcomm mobile SoC
Dev envs
internal customized/tailored QNN SDK and Hexagon SDK, official Android NDK, all of these can be automatically downloadded and setup by a hand-written script in this project, details can be seen in how-to-build.md.
official Android Studio Jellyfish (| 2023.3.1 April 30, 2024), Android Studio can be skipped for AI researchers/experts(vim + vscode is more lightweight tools), details can be seen in how-to-build.md.
QNN SDK is v2.34.0.250424,Hexagon SDK is v6.2.0.1, Android NDK is android-ndk-r28.
Running on Android phone
-
Android 7.0(2016.08) --- Android 15(2024.10) and higher version with ANY mainstream arm64 mobile SoC should be supported, issue reports are welcomed.
-
Android smartphone equipped with one of below Qualcomm mobile SoCs(Qualcomm Snapdragon 8Gen3 and 8Elite are highly recommended) is required for verify/running ggml-hexagon backend on Android phone:
Snapdragon 8 Gen 1
Snapdragon 8 Gen 1+
Snapdragon 8 Gen 2
Snapdragon 8 Gen 3(verified)
Snapdragon 8 Elite(verified)
- Android smartphone equipped with ANY mainstream high-end mobile SoC is highly recommended for realtime AI-subtitle feature otherwise unexpected behavior would happen
AI models
default ASR model is the built-in whisper model gml-tiny.en-q8_0.bin, built-in in the APK.
default LLM model is Gemma3-4B, best overall performance on my Snapdragon 8Gen3 phone and 8Elite phone.
default Multimodal model is SmolVLM2-256M
default Text2Image model is sd-v1.4
verified LLM models: Qwen1.5-1.8B,Qwen2.5-3b,Qwen3-4B,Qwen3-8B,Qwen3-14B,Gemma3-4B,Gemma3-12B
verified Multimodal models:SmolVLM2-256M,SmolVLM-500M
verified Text2Image models: sd-v1.4
all these AI models can be downloaded manually in "LLM Setting"(directly access to https://huggingface.co/ is required for download AI models in the APK):
Todo
- an automated CT approach should be introduced in this project for validate every PR/release. whether AI can be used for this purpose?
- solve the deadlock in ndkcamera:nihui/ncnn-android-scrfd#16
- remove redundant and unnecessary permissions(the permissions of send and receive SMS are actually not used/needed in the APK) of APK to strictly follow the principle of "minimum permissions' and EU's GDPR.
- sync with upstream llama.cpp&whisper.cpp
- integrate the new MTMD(audio supportive)in the upstream llama.cpp
- implement a specified feature(download AI models in the APK directly) for users/developers from China.
- others can be found at roadmap