Skip to content

Gemma3n unable to run #13248

Open
Open
@MXS-Jun

Description

@MXS-Jun

Ollama version

ollama-intel-2.3.0b20250630-ubuntu.tgz

Environment

  • Windows: Windows 11 LTSC 2024
  • WSL2: Ubuntu-22.04
  • GPU: Intel(R) Arc(TM) 140T GPU (48GB)
  • CPU: Intel(R) Core(TM) Ultra 9 285H

More info

I use wsl2 to run ollama-intel-2.3.0b20250630-ubuntu.

qwen2.5:latest, qwen3:latest and bge-m3:latest are running well.

Log info

time=2025-07-04T11:50:24.280+08:00 level=INFO source=server.go:135 msg="system memory" total="46.8 GiB" free="45.6 GiB" free_swap="12.0 GiB"
time=2025-07-04T11:50:24.281+08:00 level=INFO source=server.go:187 msg=offload library=cpu layers.requested=-1 layers.model=36 layers.offload=0 layers.split="" memory.available="[45.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.0 GiB" memory.required.partial="0 B" memory.required.kv="280.0 MiB" memory.required.allocations="[5.0 GiB]" memory.weights.total="2.6 GiB" memory.weights.repeating="2.2 GiB" memory.weights.nonrepeating="420.4 MiB" memory.graph.full="2.0 GiB" memory.graph.partial="3.7 GiB"
time=2025-07-04T11:50:24.322+08:00 level=INFO source=server.go:458 msg="starting llama server" cmd="/home/jun/ollama-intel/ollama-bin runner --ollama-engine --model /home/jun/.ollama/models/blobs/sha256-38e8dcc30df4eb0e29eaf5c74ba6ce3f2cd66badad50768fc14362acfb8b8cb6 --ctx-size 4096 --batch-size 512 --n-gpu-layers 999 --threads 16 --no-mmap --parallel 2 --port 46347"
time=2025-07-04T11:50:24.323+08:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
time=2025-07-04T11:50:24.323+08:00 level=INFO source=server.go:618 msg="waiting for llama runner to start responding"
time=2025-07-04T11:50:24.324+08:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server not responding"
time=2025-07-04T11:50:24.358+08:00 level=INFO source=runner.go:925 msg="starting ollama engine"
time=2025-07-04T11:50:24.381+08:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:46347"
time=2025-07-04T11:50:24.412+08:00 level=INFO source=ggml.go:96 msg="" architecture=gemma3n file_type=Q4_K_M name="" description="" num_tensors=847 num_key_values=40
load_backend: loaded SYCL backend from /home/jun/ollama-intel/libggml-sycl.so
load_backend: loaded CPU backend from /home/jun/ollama-intel/libggml-cpu-alderlake.so
time=2025-07-04T11:50:24.477+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.LLAMAFILE=1 CPU.0.OPENMP=1 CPU.0.AARCH64_REPACK=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Running with Environment Variables:
  GGML_SYCL_DEBUG: 0
  GGML_SYCL_DISABLE_OPT: 1
  GGML_SYCL_DISABLE_GRAPH: 1
  GGML_SYCL_PRIORITIZE_DMMV: 0
Build with Macros:
  GGML_SYCL_FORCE_MMQ: no
  GGML_SYCL_F16: no
Found 1 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                Intel Graphics [0x7d51]|  12.74|    128|    1024|   32| 54432M|         1.6.33578+15|
SYCL Optimization Feature:
|ID|        Device Type|Reorder|
|--|-------------------|-------|
| 0| [level_zero:gpu:0]|      Y|
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
time=2025-07-04T11:50:24.579+08:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server loading model"
ggml_backend_sycl_buffer_type_alloc_buffer: can't allocate 4697620480 Bytes of memory on device
alloc_tensor_range: failed to allocate SYCL0 buffer of size 4697620480
panic: insufficient memory - required allocations: {InputWeights:440832000A CPU:{Name:CPU UUID: Weights:[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U] Cache:[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U] Graph:0U} GPUs:[{Name:SYCL0 UUID: Weights:[72269184F 72269184F 72269184F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 72269184F 72269184F 72269184F 72269184F 5162939392F] Cache:[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U] Graph:0U}]}
 
goroutine 16 [running]:
github.com/ollama/ollama/ml/backend/ggml.New({0x7ffd2f12a2d5, 0x66}, {0x10, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0})
        /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/ml/backend/ggml/ggml.go:380 +0x30b6
github.com/ollama/ollama/ml.NewBackend({0x7ffd2f12a2d5, 0x66}, {0x10, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0})
        /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/ml/backend.go:209 +0xb1
github.com/ollama/ollama/model.New({0x7ffd2f12a2d5?, 0x0?}, {0x10, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0})
        /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/model/model.go:102 +0x8f
github.com/ollama/ollama/runner/ollamarunner.(*Server).initModel(0xc0004ca120, {0x7ffd2f12a2d5?, 0x0?}, {0x10, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0}, ...)
        /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:841 +0x8d
github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc0004ca120, {0x15f3a90, 0xc000592280}, {0x7ffd2f12a2d5?, 0x0?}, {0x10, 0x0, 0x3e7, {0x0, 0x0, ...}, ...}, ...)
        /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:878 +0xb8
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
        /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:959 +0xa11
time=2025-07-04T11:50:25.088+08:00 level=ERROR source=sched.go:489 msg="error loading llama server" error="llama runner process has terminated: exit status 2"
[GIN] 2025/07/04 - 11:50:25 | 500 |  926.687781ms |       127.0.0.1 | POST     "/api/generate"

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions