Gemma3n unable to run

# Ollama version

ollama-intel-2.3.0b20250630-ubuntu.tgz

# Environment

- Windows: Windows 11 LTSC 2024
- WSL2: Ubuntu-22.04
- GPU: Intel(R) Arc(TM) 140T GPU (48GB)
- CPU: Intel(R) Core(TM) Ultra 9 285H

# More info

I use wsl2 to run ollama-intel-2.3.0b20250630-ubuntu.

qwen2.5:latest, qwen3:latest and bge-m3:latest are running well.

# Log info

```
time=2025-07-04T11:50:24.280+08:00 level=INFO source=server.go:135 msg="system memory" total="46.8 GiB" free="45.6 GiB" free_swap="12.0 GiB"
time=2025-07-04T11:50:24.281+08:00 level=INFO source=server.go:187 msg=offload library=cpu layers.requested=-1 layers.model=36 layers.offload=0 layers.split="" memory.available="[45.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.0 GiB" memory.required.partial="0 B" memory.required.kv="280.0 MiB" memory.required.allocations="[5.0 GiB]" memory.weights.total="2.6 GiB" memory.weights.repeating="2.2 GiB" memory.weights.nonrepeating="420.4 MiB" memory.graph.full="2.0 GiB" memory.graph.partial="3.7 GiB"
time=2025-07-04T11:50:24.322+08:00 level=INFO source=server.go:458 msg="starting llama server" cmd="/home/jun/ollama-intel/ollama-bin runner --ollama-engine --model /home/jun/.ollama/models/blobs/sha256-38e8dcc30df4eb0e29eaf5c74ba6ce3f2cd66badad50768fc14362acfb8b8cb6 --ctx-size 4096 --batch-size 512 --n-gpu-layers 999 --threads 16 --no-mmap --parallel 2 --port 46347"
time=2025-07-04T11:50:24.323+08:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
time=2025-07-04T11:50:24.323+08:00 level=INFO source=server.go:618 msg="waiting for llama runner to start responding"
time=2025-07-04T11:50:24.324+08:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server not responding"
time=2025-07-04T11:50:24.358+08:00 level=INFO source=runner.go:925 msg="starting ollama engine"
time=2025-07-04T11:50:24.381+08:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:46347"
time=2025-07-04T11:50:24.412+08:00 level=INFO source=ggml.go:96 msg="" architecture=gemma3n file_type=Q4_K_M name="" description="" num_tensors=847 num_key_values=40
load_backend: loaded SYCL backend from /home/jun/ollama-intel/libggml-sycl.so
load_backend: loaded CPU backend from /home/jun/ollama-intel/libggml-cpu-alderlake.so
time=2025-07-04T11:50:24.477+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.LLAMAFILE=1 CPU.0.OPENMP=1 CPU.0.AARCH64_REPACK=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Running with Environment Variables:
  GGML_SYCL_DEBUG: 0
  GGML_SYCL_DISABLE_OPT: 1
  GGML_SYCL_DISABLE_GRAPH: 1
  GGML_SYCL_PRIORITIZE_DMMV: 0
Build with Macros:
  GGML_SYCL_FORCE_MMQ: no
  GGML_SYCL_F16: no
Found 1 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                Intel Graphics [0x7d51]|  12.74|    128|    1024|   32| 54432M|         1.6.33578+15|
SYCL Optimization Feature:
|ID|        Device Type|Reorder|
|--|-------------------|-------|
| 0| [level_zero:gpu:0]|      Y|
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
time=2025-07-04T11:50:24.579+08:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server loading model"
ggml_backend_sycl_buffer_type_alloc_buffer: can't allocate 4697620480 Bytes of memory on device
alloc_tensor_range: failed to allocate SYCL0 buffer of size 4697620480
panic: insufficient memory - required allocations: {InputWeights:440832000A CPU:{Name:CPU UUID: Weights:[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U] Cache:[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U] Graph:0U} GPUs:[{Name:SYCL0 UUID: Weights:[72269184F 72269184F 72269184F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 72269184F 72269184F 72269184F 72269184F 5162939392F] Cache:[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U] Graph:0U}]}
 
goroutine 16 [running]:
github.com/ollama/ollama/ml/backend/ggml.New({0x7ffd2f12a2d5, 0x66}, {0x10, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0})
        /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/ml/backend/ggml/ggml.go:380 +0x30b6
github.com/ollama/ollama/ml.NewBackend({0x7ffd2f12a2d5, 0x66}, {0x10, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0})
        /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/ml/backend.go:209 +0xb1
github.com/ollama/ollama/model.New({0x7ffd2f12a2d5?, 0x0?}, {0x10, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0})
        /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/model/model.go:102 +0x8f
github.com/ollama/ollama/runner/ollamarunner.(*Server).initModel(0xc0004ca120, {0x7ffd2f12a2d5?, 0x0?}, {0x10, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0}, ...)
        /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:841 +0x8d
github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc0004ca120, {0x15f3a90, 0xc000592280}, {0x7ffd2f12a2d5?, 0x0?}, {0x10, 0x0, 0x3e7, {0x0, 0x0, ...}, ...}, ...)
        /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:878 +0xb8
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
        /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:959 +0xa11
time=2025-07-04T11:50:25.088+08:00 level=ERROR source=sched.go:489 msg="error loading llama server" error="llama runner process has terminated: exit status 2"
[GIN] 2025/07/04 - 11:50:25 | 500 |  926.687781ms |       127.0.0.1 | POST     "/api/generate"
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gemma3n unable to run #13248

Ollama version

Environment

More info

Log info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gemma3n unable to run #13248

Description

Ollama version

Environment

More info

Log info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions