Open
Description
What happened:
I have installed the 0.17.3 version of nfd using helm. I want to get the numa node topology, so I enabled the topology updater while installing. But numa details was not added in the label. I have multiple numa while checking with lscpu. Here is the log
sdp@fl4u42:~$ kubectl logs -f nfd-node-feature-discovery-topology-updater-g8wl7
I0430 12:06:36.208275 1 nfd-topology-updater.go:163] "Node Feature Discovery Topology Updater" version="v0.17.3" nodeName="fl4u42"
I0430 12:06:36.208337 1 component.go:34] [core]original dial target is: "/host-var/lib/kubelet-podresources/kubelet.sock"
I0430 12:06:36.208357 1 component.go:34] [core][Channel #1]Channel created
I0430 12:06:36.208371 1 component.go:34] [core][Channel #1]parsed dial target is: resolver.Target{URL:url.URL{Scheme:"passthrough", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"//host-var/lib/kubelet-podresources/kubelet.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
I0430 12:06:36.208375 1 component.go:34] [core][Channel #1]Channel authority set to "%2Fhost-var%2Flib%2Fkubelet-podresources%2Fkubelet.sock"
I0430 12:06:36.208511 1 component.go:34] [core][Channel #1]Resolver state updated: {
"Addresses": [
{
"Addr": "/host-var/lib/kubelet-podresources/kubelet.sock",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Endpoints": [
{
"Addresses": [
{
"Addr": "/host-var/lib/kubelet-podresources/kubelet.sock",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Attributes": null
}
],
"ServiceConfig": null,
"Attributes": null
} (resolver returned new addresses)
I0430 12:06:36.208535 1 component.go:34] [core][Channel #1]Channel switches to new LB policy "pick_first"
I0430 12:06:36.208562 1 component.go:34] [core][Channel #1 SubChannel #2]Subchannel created
I0430 12:06:36.208569 1 component.go:34] [core][Channel #1]Channel Connectivity change to CONNECTING
I0430 12:06:36.208577 1 component.go:34] [core][Channel #1]Channel exiting idle mode
2025/04/30 12:06:36 Connected to '"/host-var/lib/kubelet-podresources/kubelet.sock"'!
I0430 12:06:36.208679 1 component.go:34] [core][Channel #1 SubChannel #2]Subchannel Connectivity change to CONNECTING
I0430 12:06:36.208720 1 component.go:34] [core][Channel #1 SubChannel #2]Subchannel picks a new address "/host-var/lib/kubelet-podresources/kubelet.sock" to connect
I0430 12:06:36.208987 1 component.go:34] [core][Channel #1 SubChannel #2]Subchannel Connectivity change to READY
I0430 12:06:36.209010 1 component.go:34] [core][Channel #1]Channel Connectivity change to READY
I0430 12:06:36.209018 1 nfd-topology-updater.go:375] "configuration file parsed" path="/etc/kubernetes/node-feature-discovery/nfd-topology-updater.conf" config={"ExcludeList":null}
I0430 12:06:36.209061 1 podresourcesscanner.go:53] "watching all namespaces"
WARNING: failed to read int from file: open /host-sys/devices/system/node/node0/cpu0/online: no such file or directory
I0430 12:06:36.209247 1 metrics.go:44] "metrics server starting" port=":8081"
I0430 12:06:36.267613 1 component.go:34] [core][Server #4]Server created
I0430 12:06:36.267645 1 nfd-topology-updater.go:145] "gRPC health server serving" port=8082
I0430 12:06:36.267690 1 component.go:34] [core][Server #4 ListenSocket #5]ListenSocket created
I0430 12:07:36.217041 1 podresourcesscanner.go:137] "podFingerprint calculated" status=<
> processing node ""
> processing 15 pods
+ aibrix-system/aibrix-kuberay-operator-55f5ddcbf4-vqrwb
+ default/nfd-node-feature-discovery-worker-w5cvn
+ aibrix-system/aibrix-redis-master-7bff9b56f5-hs5k4
+ envoy-gateway-system/envoy-gateway-5bfc954ffc-k4tf7
+ kube-system/metrics-server-5985cbc9d7-vh9pb
+ aibrix-system/aibrix-controller-manager-6489d5b587-hj2bt
+ aibrix-system/aibrix-gateway-plugins-58bdc89d9c-q67pp
+ envoy-gateway-system/envoy-aibrix-system-aibrix-eg-903790dc-54766c9758-l68wh
+ kube-system/helm-install-traefik-crd-kz6kg
+ default/nfd-node-feature-discovery-topology-updater-g8wl7
+ kube-system/svclb-envoy-aibrix-system-aibrix-eg-903790dc-1f213b6c-fdvw4
+ aibrix-system/aibrix-gpu-optimizer-75df97858d-5zb5s
+ kube-system/helm-install-traefik-j89k5
+ aibrix-system/aibrix-metadata-service-66f45c85bc-k8pzx
+ kube-system/local-path-provisioner-5cf85fd84d-hgf67
= pfp0v0011be09f6ff65dbfe0
>
I0430 12:07:36.217093 1 podresourcesscanner.go:148] "scanning pod" podName="aibrix-kuberay-operator-55f5ddcbf4-vqrwb"
I0430 12:07:36.217115 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="aibrix-kuberay-operator-55f5ddcbf4-vqrwb"
I0430 12:07:36.223315 1 podresourcesscanner.go:148] "scanning pod" podName="nfd-node-feature-discovery-worker-w5cvn"
I0430 12:07:36.223325 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="nfd-node-feature-discovery-worker-w5cvn"
I0430 12:07:36.225915 1 podresourcesscanner.go:148] "scanning pod" podName="aibrix-redis-master-7bff9b56f5-hs5k4"
I0430 12:07:36.225935 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="aibrix-redis-master-7bff9b56f5-hs5k4"
I0430 12:07:36.228169 1 podresourcesscanner.go:148] "scanning pod" podName="envoy-gateway-5bfc954ffc-k4tf7"
I0430 12:07:36.228195 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="envoy-gateway-5bfc954ffc-k4tf7"
I0430 12:07:36.231774 1 podresourcesscanner.go:148] "scanning pod" podName="metrics-server-5985cbc9d7-vh9pb"
I0430 12:07:36.231788 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="metrics-server-5985cbc9d7-vh9pb"
I0430 12:07:36.233367 1 podresourcesscanner.go:148] "scanning pod" podName="aibrix-controller-manager-6489d5b587-hj2bt"
I0430 12:07:36.233374 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="aibrix-controller-manager-6489d5b587-hj2bt"
I0430 12:07:36.234769 1 podresourcesscanner.go:148] "scanning pod" podName="aibrix-gateway-plugins-58bdc89d9c-q67pp"
I0430 12:07:36.234779 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="aibrix-gateway-plugins-58bdc89d9c-q67pp"
I0430 12:07:36.236354 1 podresourcesscanner.go:148] "scanning pod" podName="envoy-aibrix-system-aibrix-eg-903790dc-54766c9758-l68wh"
I0430 12:07:36.236361 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="envoy-aibrix-system-aibrix-eg-903790dc-54766c9758-l68wh"
I0430 12:07:36.238011 1 podresourcesscanner.go:148] "scanning pod" podName="helm-install-traefik-crd-kz6kg"
I0430 12:07:36.238017 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="helm-install-traefik-crd-kz6kg"
I0430 12:07:36.239514 1 podresourcesscanner.go:148] "scanning pod" podName="nfd-node-feature-discovery-topology-updater-g8wl7"
I0430 12:07:36.239521 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="nfd-node-feature-discovery-topology-updater-g8wl7"
I0430 12:07:36.241754 1 podresourcesscanner.go:148] "scanning pod" podName="svclb-envoy-aibrix-system-aibrix-eg-903790dc-1f213b6c-fdvw4"
I0430 12:07:36.241760 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="svclb-envoy-aibrix-system-aibrix-eg-903790dc-1f213b6c-fdvw4"
I0430 12:07:36.422134 1 podresourcesscanner.go:148] "scanning pod" podName="aibrix-gpu-optimizer-75df97858d-5zb5s"
I0430 12:07:36.422165 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="aibrix-gpu-optimizer-75df97858d-5zb5s"
I0430 12:07:36.621889 1 podresourcesscanner.go:148] "scanning pod" podName="helm-install-traefik-j89k5"
I0430 12:07:36.621923 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="helm-install-traefik-j89k5"
I0430 12:07:36.821266 1 podresourcesscanner.go:148] "scanning pod" podName="aibrix-metadata-service-66f45c85bc-k8pzx"
I0430 12:07:36.821294 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="aibrix-metadata-service-66f45c85bc-k8pzx"
I0430 12:07:37.022025 1 podresourcesscanner.go:148] "scanning pod" podName="local-path-provisioner-5cf85fd84d-hgf67"
I0430 12:07:37.022057 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="local-path-provisioner-5cf85fd84d-hgf67"
I0430 12:07:37.432143 1 metrics.go:51] "stopping metrics server" port=":8081"
I0430 12:07:37.432207 1 metrics.go:45] "metrics server stopped" exitCode="http: Server closed"
E0430 12:07:37.432223 1 main.go:66] "error while running" err="failed to create NodeResourceTopology: the server could not find the requested resource (post noderesourcetopologies.topology.node.k8s.io)"
lscpu snippet
NUMA:
NUMA node(s): 8
NUMA node0 CPU(s): 0-13,112-125
NUMA node1 CPU(s): 14-27,126-139
NUMA node2 CPU(s): 28-41,140-153
NUMA node3 CPU(s): 42-55,154-167
NUMA node4 CPU(s): 56-69,168-181
NUMA node5 CPU(s): 70-83,182-195
NUMA node6 CPU(s): 84-97,196-209
NUMA node7 CPU(s): 98-111,210-223
Environment:
- Kubernetes version (use
kubectl version
): v1.31.3+k3s1 - Cloud provider or hardware configuration: Onprem hardware, Intel(R) Xeon(R) Platinum 8480+, 512GB
- OS (e.g:
cat /etc/os-release
): Ubuntu 23.04 - Kernel (e.g.
uname -a
): 6.2.0-39-generic - Install tools:
- Network plugin and version (if this is a network-related bug):
- Others: