Skip to content

OpenVINO CPU Execution Accelerator Fails with "Device with 'NPU' name is not registered" Error #300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
alib1513 opened this issue Mar 20, 2025 · 1 comment

Comments

@alib1513
Copy link

Description
In Onnx Runtime Backend of Triton Server, when trying to load an ONNX model with OpenVINO as the CPU execution accelerator, the server throws an error indicating that the device with "NPU" name is not registered in the OpenVINO Runtime. This occurs despite explicitly configuring the parameter device_type as "CPU" in the model configuration as documented here.

Triton Information
Triton Server Version: 2.55.0 (Docker image: nvcr.io/nvidia/tritonserver:25.02-py3)
Backend: ONNX Runtime with OpenVINO execution accelerator

To Reproduce

  1. Set up Triton Server using the Docker image nvcr.io/nvidia/tritonserver:25.02-py3.
  2. Use an onnx model and add the following OpenVINO CPU optimization settings in the config.pbtxt file:
   optimization {
     execution_accelerators {
       cpu_execution_accelerator [{
         name: "openvino"
         parameters { key: "device_type" value: "CPU" }
       }]
     }
   }
  1. Launch the Triton Server, the model fails to load, and the server logs the following error:
I0319 23:51:53.570075 144 model_lifecycle.cc:473] "loading: onnx_model:1"
I0319 23:51:53.572942 144 onnxruntime.cc:2900] "TRITONBACKEND_Initialize: onnxruntime"
I0319 23:51:53.572971 144 onnxruntime.cc:2910] "Triton TRITONBACKEND API version: 1.19"
I0319 23:51:53.572984 144 onnxruntime.cc:2916] "'onnxruntime' TRITONBACKEND API version: 1.19"
I0319 23:51:53.572997 144 onnxruntime.cc:2946] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"false\",\"backend-directory\":\"/opt/tritonserver/backends\",\"device_type\":\"CPU\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}"
I0319 23:51:53.594733 144 onnxruntime.cc:3011] "TRITONBACKEND_ModelInitialize: onnx_model (version 1)"
I0319 23:51:53.595868 144 onnxruntime.cc:3076] "TRITONBACKEND_ModelInstanceInitialize: onnx_model_0_0 (CPU device 0)"
2025-03-19 23:51:53.610138949 [W:onnxruntime:log, openvino_provider_factory.cc:209 operator()] Empty OV Config Map passed. Skipping load_config option parsing.

2025-03-19 23:51:53.746170745 [E:onnxruntime:, inference_session.cc:2117 operator()] Exception during initialization: Exception from src/inference/src/cpp/core.cpp:266:
Exception from src/inference/src/dev/core_impl.cpp:609:
Device with "NPU" name is not registered in the OpenVINO Runtime


I0319 23:51:53.756724 144 onnxruntime.cc:3128] "TRITONBACKEND_ModelInstanceFinalize: delete instance state"
E0319 23:51:53.756980 144 backend_model.cc:692] "ERROR: Failed to create instance: onnx runtime error 6: Exception during initialization: Exception from src/inference/src/cpp/core.cpp:266:\nException from src/inference/src/dev/core_impl.cpp:609:\nDevice with \"NPU\" name is not registered in the OpenVINO Runtime\n\n"
I0319 23:51:53.757025 144 onnxruntime.cc:3052] "TRITONBACKEND_ModelFinalize: delete model state"
E0319 23:51:53.757088 144 model_lifecycle.cc:654] "failed to load 'onnx_model' version 1: Internal: onnx runtime error 6: Exception during initialization: Exception from src/inference/src/cpp/core.cpp:266:\nException from src/inference/src/dev/core_impl.cpp:609:\nDevice with \"NPU\" name is not registered in the OpenVINO Runtime\n\n"
I0319 23:51:53.757130 144 model_lifecycle.cc:789] "failed to load 'onnx_model'"
I0319 23:51:53.757248 144 server.cc:604] 

Despite the machine having no NPU, and explicitly specifying "CPU" as the device type in the configuration, the runtime still references an "NPU" device causing this error.

Expected behavior
The ONNX model should load successfully with OpenVINO optimizing the inference on the CPU. No errors related to an "NPU" device should occur

@fighterhit
Copy link

PTAL @mc-nv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants