drgn: add runq command #513

richl9 · 2025-07-01T01:15:46Z

No description provided.

…gnatures We will need these for upcoming commits, and the standard library doesn't provide them. There is a (currently unmerged) pull request to add them that we can base ours on. Signed-off-by: Omar Sandoval <[email protected]>

The default argparse.HelpFormatter wraps an entire string as one paragraph. argparse.RawDescriptionHelpFormatter and argparse.RawTextHelpFormatter use the given string verbatim. For upcoming commands, we want a middle ground: multiple paragraphs that can each be wrapped, as well as blocks of text that are not wrapped (for tables and example output, for example). Add a HelpFormatter subclass that does this with a reStructuredText-inspired syntax: This is a paragraph, which will be wrapped. |This is a line block. |It will not be wrapped. P.S. Lib/argparse.py in CPython currently says about HelpFormatter: Only the name of this class is considered a public API. All the methods provided by the class are considered an implementation detail. But we don't have much of a choice, so we do the best we can with unit tests. Signed-off-by: Omar Sandoval <[email protected]>

An upcoming change for automatically documenting commands will need this. Signed-off-by: Omar Sandoval <[email protected]>

Co-authored-by: Stephen Brennan <[email protected]>

This command is a straightforward wrapper around already existing helpers, so it serves as a good first example for future commands. Signed-off-by: Omar Sandoval <[email protected]>

brenns10

I intend to come back and do an actual review, but I just wanted to remark on the copyright header now since it's a quick and easy item.

brenns10 · 2025-07-01T01:50:52Z

drgn/commands/_builtin/crash/runq.py

+# This file contains code adapted from:
+# https://github.com/oracle-samples/drgn-tools (Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/)


Just to be clear, we (Oracle Linux engineers) are already approved to submit code to drgn under its existing license. Even if you didn't originally author the drgn-tools code, you're still authorized to submit it under the DCO and our legal approval. So these lines aren't really relevant to drgn's licensing.

The only benefit to retaining them is to let readers know that there is a permissively licensed version of this code which they could use under more lenient terms than the LGPL. But as time goes on and this code evolves, that will be less and less true. Especially given that we're likely to switch to drgn's version, send fixes upstream, and eventually get rid of our own copy.

I'm not against keeping these lines. But if you added them for the purpose of covering yourself against an unknown legal requirement, then I would say you're safe to drop them.

That's good to know. I added them to cite the sources. Let me drop them then:)

Signed-off-by: Richard Li <[email protected]>

brenns10 · 2025-07-01T17:32:23Z

drgn/commands/_builtin/crash/runq.py

+def task_thread_info(task: Object) -> Object:
+    """
+    Return a task's ``thread_info``
+
+    This is an equivalent to the kernel function / inline / macro
+    ``task_thread_info()``, but it must cover a wide variety of versions and
+    configurations.
+
+    :param task: Object of type ``struct task_struct *``
+    :returns: The ``struct thread_info *`` for this task
+    """
+    if has_member(task, "thread_info"):
+        return task.thread_info.address_of_()
+    return cast("struct thread_info *", task.stack)
+
+
+def task_cpu(task: Object) -> int:
+    """
+    Return the CPU on which a task is running.
+
+    This is an equivalent to the kernel function ``task_cpu()``, but it covers
+    a wide variety of variations in kernel version and configuration. It would
+    be a bit impractical to spell out all the variants, but essentially, if
+    there's a "cpu" field in ``struct task_struct``, then we can just use that.
+    Otherwise, we need to get it from the ``thread_info``.
+
+    :param task: Object of type ``struct task_struct *``
+    :retruns: The cpu as a Python int
+    """
+    if has_member(task, "cpu"):
+        return task.cpu.value_()
+    return task_thread_info(task).cpu.value_()


Both of these are now built-in to drgn, in drgn.helpers.linux.sched, so we can drop these and use them directly.

brenns10 · 2025-07-01T17:50:56Z

drgn/commands/_builtin/crash/runq.py

+def has_member(obj: Object, name: str) -> bool:
+    """
+    Return true if a given object has a member with the given name.
+    :param obj: Drgn object to check
+    :param name: string member name to check
+    :returns: whether the object has a member by that name
+    """
+    try:
+        obj.member_(name)
+        return True
+    except LookupError:
+        return False


This is a bit of a drgn-tools way of doing things which isn't necessarily common in drgn code itself. Normally, upstream drgn code will just try to access a member, and then handle the AttributeError by trying the alternative name.

We found that a lot of developers weren't necessarily comfortable with that approach and the has_member() approach made them a bit more comfortable. But I think for contributing to drgn, it's best to stick with drgn's way of doing things, and not use has_member().

Regardless, if we were to include has_member() in drgn, it would likely not belong in this module.

brenns10 · 2025-07-01T18:07:29Z

drgn/commands/_builtin/crash/runq.py

+        return task.cpu.value_()
+    return task_thread_info(task).cpu.value_()
+
+


This is a more general comment about the various helpers below.

The idea for these crash commands is that they'll mostly just be calling drgn helpers (from drgn.helpers.linux) and then formatting the output. Thus, I'm thinking these would be good guidelines:

Anything that does logic for the command should be a helper in the appropriate location. (EG many below would belong in drgn.helpers.sched)

Helpers generally should not be printing formatted info, instead they should be returning info to the caller. For cases where they are smoothing over incompatibilities between kernel versions, they can return a NamedTuple that gets built differently on different kernel versions. A good example of this is drgn.helpers.printk.get_printk_records(), which is implemented quite differently on different kernel versions, yet returns the same record structure to the caller, who can then format it and print it however they would like.

Whatever remains in the drgn.commands._builtin.crash package should really only deal with printing out the data, as well as formatting the --drgn code output. Since none of that code is intended to be consumed externally, the remaining function names should probably all be prefixed by an underscore, so they're known to be private.

Omar may want to clarify or update this, since it's just my understanding.

This is all spot on, no notes 👍

brenns10 · 2025-07-01T18:11:50Z

drgn/commands/_builtin/crash/runq.py

+def _parse_cpus_arg(cpus_arg: str, max_cpu: int) -> Set[int]:
+    """
+    Parse argument to -c for cpu restriction (e.g. '1,3,5-8').
+
+    :param cpus_arg: str
+    :param  max_cpu: int
+    :returns: a set of specified cpus
+    """
+    cpus = set()
+    for part in cpus_arg.split(","):
+        part = part.strip()
+        if "-" in part:
+            start, end = part.split("-")
+            for cpu in range(int(start), int(end) + 1):
+                if 0 <= cpu < max_cpu:
+                    cpus.add(cpu)
+        elif part:
+            cpu = int(part)
+            if 0 <= cpu < max_cpu:
+                cpus.add(cpu)
+    return cpus


Honestly, this is a useful helper on its own. It should probably belong in drgn.helpers.common.format or something like that, and could be renamed so that it doesn't specifically mention CPUs, just ranges of integers.

brenns10 · 2025-07-01T18:19:31Z

drgn/commands/_builtin/crash/runq.py

+        print("     [no tasks queued]")
+
+
+def timestamp_str(ns: int) -> str:


Tiny thought, but it may be a better API to represent the task_lastrun2now() as a datetime.timedelta().

And totally unrelatedly, Python has a bulit-in function divmod(), which returns both parts of the integer division. I guess it might be more efficient, but I think it just helps shorten and make it a bit more readable. EG:

>>> value = 123456 >>> mins, secs = divmod(value, 60) >>> hours, mins = divmod(mins, 60) >>> days, hours = divmod(hours, 24) >>> days, hours, mins, secs (1, 10, 17, 36)

brenns10 · 2025-07-01T18:49:24Z

drgn/commands/_builtin/crash/runq.py

+@crash_command(
+    description="Display the tasks on the run queues of each cpu.",
+    arguments=(
+        argument("-t", action="store_true", dest="show_timestamps"),
+        argument("-T", action="store_true", dest="show_lag"),
+        argument("-m", action="store_true", dest="pretty_runtime"),
+        argument("-g", action="store_true", dest="group"),
+        argument("-c", type=str, default="", dest="cpus"),
+        drgn_argument,
+    ),
+)


From playing around with the runq command, it seems to me that a lot of these arguments are (thankfully) mutually exclusive, and crash just doesn't document that's the case. The only option that can be combined with the others is -c.

Each of -t | -T | -m are separate modes that behave similarly. They will iterate over the runqueues and print out some sort of timestamp information in various formats. They cannot be combined.

The alternative mode is to print the contents (i.e. tasks) in each runqueue, with -g optionally modifying the way they are printed.

It seems fair to me that we may implement each mode in a separate function, to simplify things. And we could make the mutual exclusion explicit here:

Suggested change

@crash_command(

description="Display the tasks on the run queues of each cpu.",

arguments=(

argument("-t", action="store_true", dest="show_timestamps"),

argument("-T", action="store_true", dest="show_lag"),

argument("-m", action="store_true", dest="pretty_runtime"),

argument("-g", action="store_true", dest="group"),

argument("-c", type=str, default="", dest="cpus"),

drgn_argument,

),

)

@crash_command(

description="Display the tasks on the run queues of each cpu.",

arguments=(

mutually_exclusive_group(

argument("-t", action="store_true", dest="show_timestamps"),

argument("-T", action="store_true", dest="show_lag"),

argument("-m", action="store_true", dest="pretty_runtime"),

argument("-g", action="store_true", dest="group"),

),

argument("-c", type=str, default="", dest="cpus"),

drgn_argument,

),

)

brenns10 · 2025-07-01T18:57:59Z

drgn/commands/_builtin/crash/runq.py

+    return "%d %02d:%02d:%02d.%03d" % (days, hours, mins, secs, ms)
+
+
+def run_queue(prog: Program, args: argparse.Namespace) -> None:


One thing missing here is an implementation of the --drgn argument. The more code we refactor into reusable drgn helpers, the easier it becomes to print the equivalent drgn code. In fact, I think starting from the --drgn output may be a really good way to think about how you want to refactor the code into helpers. For instance, here's some pseudocode I've made up, which we may output for runq --drgn -c 1,3-5:

from drgn.helpers.common.parse import range_list from drgn.helpers.linux.percpu import per_cpu from drgn.helpers.linux.sched import task_lastrun2now for cpu in range_list("1,3-5"): rq = per_cpu(prog["runqueues"], cpu) runtime_ns = task_lastrun2now(rq.curr)

Starting from there, gives a good idea what helpers we'd like to have, how they'd be used, and where they should be imported from.

Again totally agree with the general advice. Your example does raise a small question: should --drgn output use the arguments given in its example output (range_list("1,3-5")) or use placeholders (range_list(cpu_list))? I opted for the latter for the mount command to make the generated code more general, but an argument could be made for the former, too.

I have a slight preference for inserting the arguments into the script, so that ideally the output could be redirected to a file and directly sourced or run via the drgn command without modification.

But then again, substituting an Object into the text wouldn't really make sense... You could output a constructor call (Object(prog, "struct foo *", value=0xdeadbeef)) but that doesn't seem like it would be very helpful - it wouldn't at all be applicable to other programs or situations. So considering the complications of doing this consistently, I think you're right - using a placeholder variable name seems like the best way forward. Maybe there should be a comment at the beginning, containing all the placeholders that the user should define?

Or maybe rather than a comment, we could do this, which makes it clear what's happening but also makes it easy to reassign or change:

cpu_ranges = "1,3-5" for cpu in range_list(cpu_ranges): ...

And for mount:

task = Object(prog, "struct task_struct *", 0xdeadbeef) # or task = find_task(prog, 1) ...

This way things are still easy to source and run, but all the variable declarations based on input parameters are placed at the top, so they're easy to see and edit.

I loved this suggestion, so I modified mount to do this, and I'm also making the crash command test cases exec() the output of --drgn by default to enforce that the output is always valid and runnable.

brenns10 · 2025-07-01T20:10:18Z

drgn/commands/_builtin/crash/runq.py

+    prio_array = (
+        hex(runqueue.rt.active.address_ - 16) if runqueue.rt.active.address_ else 0
+    )


Why do we subtract 16 here? And why wouldn't runqueue.rt.active.address_of_() work here?

richl9 · 2025-07-01T20:47:12Z

@brenns10 Thanks a lot for the detailed comments! Thera are a lot to digest. I will go over them one by one.

osandov and others added 8 commits June 20, 2025 16:49

drgndoc: store value of variables

a1bc216

An upcoming change for automatically documenting commands will need this. Signed-off-by: Omar Sandoval <[email protected]>

Add command framework

837ee72

Co-authored-by: Stephen Brennan <[email protected]>

Add crash compatibility mode

546f6b7

crash: add mount command

7dda3ea

This command is a straightforward wrapper around already existing helpers, so it serves as a good first example for future commands. Signed-off-by: Omar Sandoval <[email protected]>

drgn.helpers.linux.device: add for_each_registered_{chr,blk}dev()

51f75d7

wip crash commands

b53e65b

brenns10 reviewed Jul 1, 2025

View reviewed changes

drgn: add runq command

089995d

Signed-off-by: Richard Li <[email protected]>

richl9 force-pushed the richard/commands/runq branch from 557f137 to 089995d Compare July 1, 2025 16:28

brenns10 suggested changes Jul 1, 2025

View reviewed changes

osandov force-pushed the commands branch 2 times, most recently from 44dc4c1 to eb2fab3 Compare July 2, 2025 22:39

osandov force-pushed the commands branch 2 times, most recently from c596f2e to 1e0b710 Compare July 10, 2025 08:12

		# This file contains code adapted from:
		# https://github.com/oracle-samples/drgn-tools (Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/)

		return task.cpu.value_()
		return task_thread_info(task).cpu.value_()

		print(" [no tasks queued]")


		def timestamp_str(ns: int) -> str:

		return "%d %02d:%02d:%02d.%03d" % (days, hours, mins, secs, ms)


		def run_queue(prog: Program, args: argparse.Namespace) -> None:

drgn: add runq command #513

Are you sure you want to change the base?

drgn: add runq command #513

Uh oh!

Conversation

richl9 commented Jul 1, 2025

Uh oh!

brenns10 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brenns10 Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

osandov Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

richl9 commented Jul 1, 2025

Uh oh!

Uh oh!

brenns10 Jul 1, 2025 •

edited

Loading

osandov Jul 2, 2025 •

edited

Loading