-
Notifications
You must be signed in to change notification settings - Fork 114
Pod Fails Mounting Volume - Device Attach Timeout #1850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Here is a support bundle generated: |
I was able to solve the initial firewall issue with the attachdetach-controller more specific than turning the firewall off,
firewall deny logs before I opened the port and the attach succeeded:
|
I did some more testing, I was able to see the volume was mounted to the node at |
Something seems off here. |
I recreated the pod and volume but the same kubelet mount error persists. It does look like the subsystem error is gone from the logs now though and I see the volume is mounted on the node:
io-engine pod logs look better:
|
Still seeing these errors in the csi-node logs for the node running the pod (successfully looks to have mounted the volume to /dev/nvme0n1:
|
Please share dmesg logs from the application node |
Here are the dmesg logs from that time period: NOTE: confusing I know but io-engine logs look to be in UTC time and this system time is PDT
|
I just tried this again and get the same error but here is the dmesg logs from the node. Here is the volume as well with the replicas
|
Aha, I think it's this: #1838 |
That image tag worked, and yes the tagged issue makes sense. I hadn't rebooted the nodes yet to have the modified boot parameters take over, thinking that could wait until I did some initial testing. |
Great, thank you for confirming! |
Describe the bug
Originally I ran into this bug on creating a DiskPool for my nodes backed by 14TB RAID: openebs/openebs#3820 but was able to get past that issue.
Now I am trying to create a PVC and mount it to a pod. The PVC creates correctly, I can see the volume and volume replicas. Originally I was getting an error on the attachdetach-controller about a grpc timeout, turning off the firewall temporarily fixed that issue, but I will have to dig into that more later. After I turned off the node firewalls the volume attached successfully but the mount is failing with this error:
A big cavaet is that I only have 2 storage nodes for this testing, my availability of physical hardware is limited so I adjusted the openebs helm chart to have 2 replicas of etcd and I am testing 1 and 2 replica volumes.
I also noticed some concerning logs:
io-engine pod logs on node with pod trying to mount volume:
This is from the csi node pod on the node with the pod:
To Reproduce
Create 1 or 2 replica volume
Create pod attaching PVC and monitor if it mounts.
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
** OS info (please complete the following information):**
Additional context
2 physical servers running rocky 8.9 and 8.10
2 raid pools per node 1 for OS and 1 for OpenEBS data (both luks encrypted)
k8s 1.31.1
openebs 4.2
The text was updated successfully, but these errors were encountered: