Skip to content

Commit 8f6aca8

Browse files
committed
kdump support (#729)
In the event of a kernel crash, we need to gather as much information as possible to understand and identify the root cause of the crash. Currently, the kernel does not provide much information, which make kernel crash investigation difficult and time consuming. Fortunately, there is a way in the kernel to provide more information in the case of a kernel crash. kdump is a feature of the Linux kernel that creates crash dumps in the event of a kernel crash. This PR will add kernel kdump support. Please note that there is another PR in sonic-utilities which is also needed: sonic-net/sonic-buildimage#3722 An extension to the CLI utilities config and show is provided to configure and manage kdump: view kdump status (enabled/disabled, active, configuration, stored crash files) enable / disable kdump functionality configure kdump (how many kernel crash logs can be saved, memory allocated for capture kernel) view kernel crash logs There is a design document which describes this kdump implementation: sonic-net/SONiC#510
1 parent d68cc05 commit 8f6aca8

File tree

6 files changed

+779
-1
lines changed

6 files changed

+779
-1
lines changed

config/main.py

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1200,6 +1200,51 @@ def shutdown():
12001200
"""Shut down BGP session(s)"""
12011201
pass
12021202

1203+
@config.group()
1204+
def kdump():
1205+
""" Configure kdump """
1206+
if os.geteuid() != 0:
1207+
exit("Root privileges are required for this operation")
1208+
pass
1209+
1210+
@kdump.command()
1211+
def disable():
1212+
"""Disable kdump operation"""
1213+
config_db = ConfigDBConnector()
1214+
if config_db is not None:
1215+
config_db.connect()
1216+
config_db.mod_entry("KDUMP", "config", {"enabled": "false"})
1217+
run_command("sonic-kdump-config --disable")
1218+
1219+
@kdump.command()
1220+
def enable():
1221+
"""Enable kdump operation"""
1222+
config_db = ConfigDBConnector()
1223+
if config_db is not None:
1224+
config_db.connect()
1225+
config_db.mod_entry("KDUMP", "config", {"enabled": "true"})
1226+
run_command("sonic-kdump-config --enable")
1227+
1228+
@kdump.command()
1229+
@click.argument('kdump_memory', metavar='<kdump_memory>', required=True)
1230+
def memory(kdump_memory):
1231+
"""Set memory allocated for kdump capture kernel"""
1232+
config_db = ConfigDBConnector()
1233+
if config_db is not None:
1234+
config_db.connect()
1235+
config_db.mod_entry("KDUMP", "config", {"memory": kdump_memory})
1236+
run_command("sonic-kdump-config --memory %s" % kdump_memory)
1237+
1238+
@kdump.command()
1239+
@click.argument('kdump_num_dumps', metavar='<kdump_num_dumps>', required=True, type=int)
1240+
def num_dumps(kdump_num_dumps):
1241+
"""Set max number of dump files for kdump"""
1242+
config_db = ConfigDBConnector()
1243+
if config_db is not None:
1244+
config_db.connect()
1245+
config_db.mod_entry("KDUMP", "config", {"num_dumps": kdump_num_dumps})
1246+
run_command("sonic-kdump-config --num_dumps %d" % kdump_num_dumps)
1247+
12031248
# 'all' subcommand
12041249
@shutdown.command()
12051250
@click.option('-v', '--verbose', is_flag=True, help="Enable verbose output")

scripts/generate_dump

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -436,6 +436,18 @@ main() {
436436
fi
437437
done
438438

439+
# archive kernel dump files
440+
for file in $(find_files "/var/crash/"); do
441+
# don't gzip already-gzipped dmesg files :)
442+
if [ ! ${file} = "/var/crash/kexec_cmd" -a ! ${file} = "/var/crash/export" ]; then
443+
if [[ ${file} == *"kdump."* ]]; then
444+
save_file $file kdump false
445+
else
446+
save_file $file kdump true
447+
fi
448+
fi
449+
done
450+
439451
# clean up working tar dir before compressing
440452
$RM $V -rf $TARDIR
441453

scripts/reboot

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,12 @@
11
#!/bin/bash
22

3+
# Reboot immediately if we run the kdump capture kernel
4+
VMCORE_FILE=/proc/vmcore
5+
if [ -e $VMCORE_FILE -a -s $VMCORE_FILE ]; then
6+
debug "We have a /proc/vmcore, then we just kdump'ed"
7+
/sbin/reboot
8+
fi
9+
310
REBOOT_USER=$(logname)
411
REBOOT_TIME=$(date)
512
PLATFORM=$(sonic-cfggen -H -v DEVICE_METADATA.localhost.platform)

0 commit comments

Comments
 (0)