Skip to content

kantv-1.6.10

Latest
Compare
Choose a tag to compare
@jeffzhou2000 jeffzhou2000 released this 24 May 12:30
· 13 commits to master since this release
198620c

Changes

  • enable Markdown rendering for LLM inference result(this feature is exactly same to webui in llama-server)
  • enable self-contained build with internal customized toolchains #325
  • add prebuilt-apk via Github-Action for Android phone equipped without Qualcomm mobile SoC
  • sync with upstream llama.cpp&whisper.cpp
  • ggml-hexagon: fix bug "LLM inference with cDSP cann't works" #330
  • realtime video recognition on Android phone, powered by MTMD in upstream llama.cpp #341
  • re-enable stable-diffusion
  • improve stability
  • refine docs(toplevel README.md and how-to-build.md and other docs)
  • enable release build after fix a long-term issue
  • All Features

    • TV playback
    • TV playback + TV recording
    • local playback with recorded video file
    • TV playback + AI-subtitle(powered by whisper.cpp)
    • TV playback + TV recording + AI-subtitle
    • ASR inference(through internal customized whisper.cpp) benchmark with ggml backend & Hexagon-cDSP backend
    • LLM inference(through internal customized llama.cpp) benchmark with ggml backend & Hexagon-cDSP backend
    • multi-modal(image-2-text) LLM inference(through internal customized llama.cpp) benchmark with ggml backend & Hexagon-cDSP backend
    • text-2-image(through internal customized stable-diffusion.cpp) inference with ggml backend
    • 2D graphic benchmark
    • video encode benchmark
    • video encode benchmark and create code-generated video file
    • local playback with code-generated video file
    • download LLM model(in local dev envs)
    • edit tv.xml(aka customize tv.xml for personal need or R&D activity)
    • realtime video recognition via MTMD in upstream llama.cpp + SmolVLM2-256M with the default ggml backend

    Prebuilt Android APK for Qualcomm Snapdragon based phone and non qcom based phone

    Dev envs

    internal customized/tailored QNN SDK and Hexagon SDK, official Android NDK, all of these can be automatically downloadded and setup by a hand-written script in this project, details can be seen in how-to-build.md.

    official Android Studio Jellyfish (| 2023.3.1 April 30, 2024), Android Studio can be skipped for AI researchers/experts(vim + vscode is more lightweight tools), details can be seen in how-to-build.md.

    QNN SDK is v2.34.0.250424,Hexagon SDK is v6.2.0.1, Android NDK is android-ndk-r28.

    Running on Android phone

    • Android 7.0(2016.08) --- Android 15(2024.10) and higher version with ANY mainstream arm64 mobile SoC should be supported, issue reports are welcomed.

    • Android smartphone equipped with one of below Qualcomm mobile SoCs(Qualcomm Snapdragon 8Gen3 and 8Elite are highly recommended) is required for verify/running ggml-hexagon backend on Android phone:

    Snapdragon 8 Gen 1
    Snapdragon 8 Gen 1+
    Snapdragon 8 Gen 2
    Snapdragon 8 Gen 3(verified)
    Snapdragon 8 Elite(verified)

    • Android smartphone equipped with ANY mainstream high-end mobile SoC is highly recommended for realtime AI-subtitle feature otherwise unexpected behavior would happen

    AI models

    default ASR model is the built-in whisper model gml-tiny.en-q8_0.bin, built-in in the APK.
    default LLM model is Gemma3-4B, best overall performance on my Snapdragon 8Gen3 phone and 8Elite phone.
    default Multimodal model is SmolVLM2-256M
    default Text2Image model is sd-v1.4

    verified LLM models: Qwen1.5-1.8B,Qwen2.5-3b,Qwen3-4B,Qwen3-8B,Qwen3-14B,Gemma3-4B,Gemma3-12B
    verified Multimodal models:SmolVLM2-256M,SmolVLM-500M
    verified Text2Image models: sd-v1.4

    all these AI models can be downloaded manually in "LLM Setting"(directly access to https://huggingface.co/ is required for download AI models in the APK):
    Screenshot_2025-05-24-20-34-22-01_d67cb27b45d33330ede3b063fe7603d2

    Todo

    • an automated CT approach should be introduced in this project for validate every PR/release. whether AI can be used for this purpose?
    • solve the deadlock in ndkcamera:nihui/ncnn-android-scrfd#16
    • remove redundant and unnecessary permissions(the permissions of send and receive SMS are actually not used/needed in the APK) of APK to strictly follow the principle of "minimum permissions' and EU's GDPR.
    • sync with upstream llama.cpp&whisper.cpp
    • integrate the new MTMD(audio supportive)in the upstream llama.cpp
    • implement a specified feature(download AI models in the APK directly) for users/developers from China.
    • others can be found at roadmap