[master] thermalctld leak on Arista devices makes them unreachable when memory is exhausted

#### Description

On Arista devices `thermalctld` leaks memory (unclear if other vendors are affected).
At each loop iteration of `thermalctld` a few more MB are consumed (~3MiB every 60s)
After running for a few hours, `pmon` consumes up to 75% of memory at which point the device becomes unresponsive.
- Existing ssh sessions will freeze
- Console becomes unresponsive
- Pings still go through

#### Steps to reproduce the issue:
1. Install latest master image
2. Wait for a few hours while monitoring (you can make this faster by filling a tmpfs to reduce available memory)
    Run `docker stats` to see memory size of pmon growing
    Run `watch -d -n 1 'docker exec -ti pmon sh -c "ps aux | grep -v aux"'` to see memory growing at pmon process level
3. Witness ssh/console hanging while ping still working
4. After some more time kernel will panic

#### Output seen on the console after kernel panic

```
[17758.188885] Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled             
[17758.188885]                                                                                            
[17758.305661] CPU: 2 PID: 4197 Comm: supervisord Tainted: G           OE     4.19.0-12-2-amd64 #1 Debian 4.19.152-1                                                                                                
[17758.428678] Hardware name: Intel Camelback Mountain CRB/Camelback Mountain CRB, BIOS Aboot-norcal7-rook-2x4--6128821 09/14/2017                                                                                  
[17758.566281] Call Trace:                                                                                
[17758.595557]  dump_stack+0x66/0x90                                                                      
[17758.635242]  panic+0xe7/0x24a                                                                         
[17758.670765]  out_of_memory.cold.33+0x5e/0x82                                                           
[17758.721909]  __alloc_pages_slowpath+0xbd8/0xcb0                                                        
[17758.776180]  __alloc_pages_nodemask+0x28b/0x2b0                                                       
[17758.830450]  filemap_fault+0x333/0x780                                                                 
[17758.875346]  ? alloc_set_pte+0x49e/0x560         
[17758.922325]  ? filemap_map_pages+0x139/0x3a0     
[17758.973494]  ext4_filemap_fault+0x2c/0x40 [ext4] 
[17759.028809]  __do_fault+0x36/0x130               
[17759.069538]  __handle_mm_fault+0xdf9/0x11f0                                                           
[17759.119642]  handle_mm_fault+0xd6/0x200           
[17759.165579]  __do_page_fault+0x249/0x4f0                                                              
[17759.212560]  ? page_fault+0x8/0x30                                                                     
[17759.253287]  page_fault+0x1e/0x30                                                                     
[17759.292974] RIP: 0033:0x54e6ba
```

#### Additional information you deem important (e.g. issue happens only occasionally):

This issue is happening consistently on master.
It is currently being looked at and this issue opened for awareness.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[master] thermalctld leak on Arista devices makes them unreachable when memory is exhausted #7515

Description

Steps to reproduce the issue:

Output seen on the console after kernel panic

Additional information you deem important (e.g. issue happens only occasionally):

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[master] thermalctld leak on Arista devices makes them unreachable when memory is exhausted #7515

Description

Description

Steps to reproduce the issue:

Output seen on the console after kernel panic

Additional information you deem important (e.g. issue happens only occasionally):

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions