-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[Arista][T2] Kernel panic seen on supervisor during reboot tests #20901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Chassis 🤖
Modular chassis support
Issue for 202405
P0
Priority of the issue
Triaged
this issue has been triaged
Comments
@saiarcot895, can you please help with this issue |
@saiarcot895 ping again, thanks |
Will look at this next week |
arista-nwolfe
added a commit
to arista-nwolfe/sonic-linux-kernel
that referenced
this issue
Dec 11, 2024
arista-nwolfe
added a commit
to arista-nwolfe/sonic-linux-kernel
that referenced
this issue
Dec 18, 2024
For certain pci tree structures involving pci devices with sibling functions we can get a nullptr dereference when the link state goes down due to this change. Fixing: sonic-net/sonic-buildimage#20901 Reverting: torvalds/linux@456d8aa Upstream Discussion: https://lore.kernel.org/linux-pci/20240801171103.GA107989@bhelgaas/T/#t
saiarcot895
pushed a commit
to sonic-net/sonic-linux-kernel
that referenced
this issue
Dec 21, 2024
…#448) * Reverting 456d8aa to avoid kernel panics in 6.1.94 For certain pci tree structures involving pci devices with sibling functions we can get a nullptr dereference when the link state goes down due to this change. Fixing: sonic-net/sonic-buildimage#20901 Reverting: torvalds/linux@456d8aa Upstream Discussion: https://lore.kernel.org/linux-pci/20240801171103.GA107989@bhelgaas/T/#t * Adding upstream discussion link and summary in patch
kenneth-arista
pushed a commit
to kenneth-arista/sonic-linux-kernel
that referenced
this issue
Jan 8, 2025
…sonic-net#448) * Reverting 456d8aa to avoid kernel panics in 6.1.94 For certain pci tree structures involving pci devices with sibling functions we can get a nullptr dereference when the link state goes down due to this change. Fixing: sonic-net/sonic-buildimage#20901 Reverting: torvalds/linux@456d8aa Upstream Discussion: https://lore.kernel.org/linux-pci/20240801171103.GA107989@bhelgaas/T/#t * Adding upstream discussion link and summary in patch
bingwang-ms
pushed a commit
to sonic-net/sonic-linux-kernel
that referenced
this issue
Jan 9, 2025
…#448) (#453) * Reverting 456d8aa to avoid kernel panics in 6.1.94 For certain pci tree structures involving pci devices with sibling functions we can get a nullptr dereference when the link state goes down due to this change. Fixing: sonic-net/sonic-buildimage#20901 Reverting: torvalds/linux@456d8aa Upstream Discussion: https://lore.kernel.org/linux-pci/20240801171103.GA107989@bhelgaas/T/#t * Adding upstream discussion link and summary in patch Co-authored-by: arista-nwolfe <[email protected]>
arista-nwolfe
added a commit
to arista-nwolfe/sonic-linux-kernel
that referenced
this issue
Apr 25, 2025
Both the revert and fix should avoid kernel panics described in sonic-net/sonic-buildimage#20901 but by patching the fix earlier hopefully it makes adopting this change easier in the kernel bump ups to come.
arista-nwolfe
added a commit
to arista-nwolfe/sonic-linux-kernel
that referenced
this issue
Apr 28, 2025
Both the revert and fix should avoid kernel panics described in sonic-net/sonic-buildimage#20901 but by patching the fix earlier hopefully it makes adopting this change easier in the kernel bump ups to come.
saiarcot895
pushed a commit
to sonic-net/sonic-linux-kernel
that referenced
this issue
Apr 30, 2025
* [PCI/ASPM] Replaced revert with patch of proper upstream fix Both the revert and fix should avoid kernel panics described in sonic-net/sonic-buildimage#20901 but by patching the fix earlier hopefully it makes adopting this change easier in the kernel bump ups to come. * Added upstream commit hash to patch
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Chassis 🤖
Modular chassis support
Issue for 202405
P0
Priority of the issue
Triaged
this issue has been triaged
As indicated in aristanetworks/sonic#109 during reboot tests (module api platform tests) a kernel panic can occur on the supervisor, this was introduced in the kernel upgrade to 6.1.94 (6.1.0-22-2)
#19885
Upon further investigation it was this specific change that seems to have caused this kernel panic:
torvalds/linux@456d8aa
We can see this commit is present when comparing the previous version (6.1.38)
https://elixir.free-electrons.com/linux/v6.1.38/source/drivers/pci/pcie/aspm.c#L1003
And the newer version (6.1.94)
https://elixir.free-electrons.com/linux/v6.1.94/source/drivers/pci/pcie/aspm.c#L1018
The text was updated successfully, but these errors were encountered: