Skip to content

T2: Zebra core at route_map_lookup_by_name/hash_get intermittently on LC 'config reload' #37

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rajendrat opened this issue Aug 23, 2023 · 0 comments

Comments

@rajendrat
Copy link

  • When reporting a crash, provide a backtrace
  • When pasting configs, logs, shell output, backtraces, and other large chunks of text use Markdown code blocks
  • Include the FRR version; if you built from Git, please provide the commit hash
  • Write your issue in English

Describe the bug

Zebra core seen on config reload, on route_map_lookup_by_name/hash_get on different instances route del or route map get code path. Added different tracebacks below.

[X] Did you check if this is a duplicate issue?
[ ] Did you test it on the latest FRRouting/frr master branch?

To Reproduce
Steps to reproduce the behavior:
Running sonic-mgmt/acl test with presanity, which does config reload.
Issue is not seen every time, but 1 out or 4 iteration hitting this issue.

Screenshots
Route_map_delete path
image

image

Versions

Additional context
Seeing the issue when the BFD is enabled.
viz
@abdosi , @anamehra , @vperumal

lguohan pushed a commit to sonic-net/sonic-buildimage that referenced this issue Sep 9, 2023
…16456)

Why I did it
Zebra core sometimes seen during config reload. Series of route-map deletions and then re-adds, and this triggers the hash table to realloc to grow to a larger size, then subsuquent route-map operations will be against a corrupted hash table.

Issue is seen when we have BFD Enable on Static Route table we see Static route-map being created/deleted based on bfd session state. However issue itself is very generic from FRR perspective.

Thie issue has detailed core info sonic-net/sonic-frr#37 . This PR fixes this issue.
Fixes#sonic-net/sonic-frr#37

Work item tracking
Microsoft ADO (17952227):

How I did it
This fix is already in Master frr/8.2.5. Porting this fix to 202205 branch to address this Zebra core.
sonic-net/sonic-frr@5f503e5

Solution:
The whole purpose of the delay of deletion and the storage of the route-map is to allow the using protocol the ability to process the route-map at a later time while still retaining the route-map name( for more efficient reprocessing ). The problem exists because we are keeping multiple copies of deletion events that are indistinguishable from each other causing hash havoc.

How to verify it
Verified running sonic-mgmt test, doing multiple config reloads.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant