You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When start Xinference server and you hit the error "ValueError: Model architectures ['Qwen2ForCausalLM'] failed to be inspected. Please check the logs for more details. "
115
+
116
+
The logs shows the error, ``"Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp-a34b3233.so.1 library. Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it."``
117
+
118
+
This is mostly because your NumPy is installed by conda and conda's Numpy is built with Intel MKL optimizations, which is causing a conflict with the GNU OpenMP library (libgomp) that's already loaded in the environment.
119
+
120
+
.. code-block:: text
121
+
122
+
MKL_THREADING_LAYER=GNU xinference-local
123
+
124
+
Setting ``MKL_THREADING_LAYER=GNU`` forces Intel's Math Kernel Library to use GNU's OpenMP implementation instead of Intel's own implementation.
125
+
126
+
Or you can uninstall conda's numpy and reinstall with pip.
127
+
128
+
On a related subject, if you use vllm, do not install pytorch with conda, check https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html for detailed information.
0 commit comments