Skip to content

drgn: add runq command #513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: commands
Choose a base branch
from

Conversation

richl9
Copy link

@richl9 richl9 commented Jul 1, 2025

No description provided.

osandov and others added 8 commits June 20, 2025 16:49
…gnatures

We will need these for upcoming commits, and the standard library
doesn't provide them. There is a (currently unmerged) pull request to
add them that we can base ours on.

Signed-off-by: Omar Sandoval <[email protected]>
The default argparse.HelpFormatter wraps an entire string as one
paragraph. argparse.RawDescriptionHelpFormatter and
argparse.RawTextHelpFormatter use the given string verbatim. For
upcoming commands, we want a middle ground: multiple paragraphs that can
each be wrapped, as well as blocks of text that are not wrapped (for
tables and example output, for example).

Add a HelpFormatter subclass that does this with a
reStructuredText-inspired syntax:

    This is a paragraph, which will
    be wrapped.

    |This is a line block.
    |It will not be wrapped.

P.S. Lib/argparse.py in CPython currently says about HelpFormatter:

    Only the name of this class is considered a public API. All the methods
    provided by the class are considered an implementation detail.

But we don't have much of a choice, so we do the best we can with unit
tests.

Signed-off-by: Omar Sandoval <[email protected]>
An upcoming change for automatically documenting commands will need
this.

Signed-off-by: Omar Sandoval <[email protected]>
Co-authored-by: Stephen Brennan <[email protected]>
This command is a straightforward wrapper around already existing
helpers, so it serves as a good first example for future commands.

Signed-off-by: Omar Sandoval <[email protected]>
Copy link
Contributor

@brenns10 brenns10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intend to come back and do an actual review, but I just wanted to remark on the copyright header now since it's a quick and easy item.

Comment on lines 3 to 4
# This file contains code adapted from:
# https://github.com/oracle-samples/drgn-tools (Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, we (Oracle Linux engineers) are already approved to submit code to drgn under its existing license. Even if you didn't originally author the drgn-tools code, you're still authorized to submit it under the DCO and our legal approval. So these lines aren't really relevant to drgn's licensing.

The only benefit to retaining them is to let readers know that there is a permissively licensed version of this code which they could use under more lenient terms than the LGPL. But as time goes on and this code evolves, that will be less and less true. Especially given that we're likely to switch to drgn's version, send fixes upstream, and eventually get rid of our own copy.

I'm not against keeping these lines. But if you added them for the purpose of covering yourself against an unknown legal requirement, then I would say you're safe to drop them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good to know. I added them to cite the sources. Let me drop them then:)

Signed-off-by: Richard Li <[email protected]>
@richl9 richl9 force-pushed the richard/commands/runq branch from 557f137 to 089995d Compare July 1, 2025 16:28
Comment on lines +36 to +67
def task_thread_info(task: Object) -> Object:
"""
Return a task's ``thread_info``

This is an equivalent to the kernel function / inline / macro
``task_thread_info()``, but it must cover a wide variety of versions and
configurations.

:param task: Object of type ``struct task_struct *``
:returns: The ``struct thread_info *`` for this task
"""
if has_member(task, "thread_info"):
return task.thread_info.address_of_()
return cast("struct thread_info *", task.stack)


def task_cpu(task: Object) -> int:
"""
Return the CPU on which a task is running.

This is an equivalent to the kernel function ``task_cpu()``, but it covers
a wide variety of variations in kernel version and configuration. It would
be a bit impractical to spell out all the variants, but essentially, if
there's a "cpu" field in ``struct task_struct``, then we can just use that.
Otherwise, we need to get it from the ``thread_info``.

:param task: Object of type ``struct task_struct *``
:retruns: The cpu as a Python int
"""
if has_member(task, "cpu"):
return task.cpu.value_()
return task_thread_info(task).cpu.value_()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these are now built-in to drgn, in drgn.helpers.linux.sched, so we can drop these and use them directly.

Comment on lines +22 to +33
def has_member(obj: Object, name: str) -> bool:
"""
Return true if a given object has a member with the given name.
:param obj: Drgn object to check
:param name: string member name to check
:returns: whether the object has a member by that name
"""
try:
obj.member_(name)
return True
except LookupError:
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a drgn-tools way of doing things which isn't necessarily common in drgn code itself. Normally, upstream drgn code will just try to access a member, and then handle the AttributeError by trying the alternative name.

We found that a lot of developers weren't necessarily comfortable with that approach and the has_member() approach made them a bit more comfortable. But I think for contributing to drgn, it's best to stick with drgn's way of doing things, and not use has_member().

Regardless, if we were to include has_member() in drgn, it would likely not belong in this module.

return task.cpu.value_()
return task_thread_info(task).cpu.value_()


Copy link
Contributor

@brenns10 brenns10 Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a more general comment about the various helpers below.

The idea for these crash commands is that they'll mostly just be calling drgn helpers (from drgn.helpers.linux) and then formatting the output. Thus, I'm thinking these would be good guidelines:

  1. Anything that does logic for the command should be a helper in the appropriate location. (EG many below would belong in drgn.helpers.sched)
  2. Helpers generally should not be printing formatted info, instead they should be returning info to the caller. For cases where they are smoothing over incompatibilities between kernel versions, they can return a NamedTuple that gets built differently on different kernel versions. A good example of this is drgn.helpers.printk.get_printk_records(), which is implemented quite differently on different kernel versions, yet returns the same record structure to the caller, who can then format it and print it however they would like.
  3. Whatever remains in the drgn.commands._builtin.crash package should really only deal with printing out the data, as well as formatting the --drgn code output. Since none of that code is intended to be consumed externally, the remaining function names should probably all be prefixed by an underscore, so they're known to be private.

Omar may want to clarify or update this, since it's just my understanding.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all spot on, no notes 👍

Comment on lines +126 to +146
def _parse_cpus_arg(cpus_arg: str, max_cpu: int) -> Set[int]:
"""
Parse argument to -c for cpu restriction (e.g. '1,3,5-8').

:param cpus_arg: str
:param max_cpu: int
:returns: a set of specified cpus
"""
cpus = set()
for part in cpus_arg.split(","):
part = part.strip()
if "-" in part:
start, end = part.split("-")
for cpu in range(int(start), int(end) + 1):
if 0 <= cpu < max_cpu:
cpus.add(cpu)
elif part:
cpu = int(part)
if 0 <= cpu < max_cpu:
cpus.add(cpu)
return cpus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, this is a useful helper on its own. It should probably belong in drgn.helpers.common.format or something like that, and could be renamed so that it doesn't specifically mention CPUs, just ranges of integers.

print(" [no tasks queued]")


def timestamp_str(ns: int) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny thought, but it may be a better API to represent the task_lastrun2now() as a datetime.timedelta().

And totally unrelatedly, Python has a bulit-in function divmod(), which returns both parts of the integer division. I guess it might be more efficient, but I think it just helps shorten and make it a bit more readable. EG:

>>> value = 123456
>>> mins, secs = divmod(value, 60)
>>> hours, mins = divmod(mins, 60)
>>> days, hours = divmod(hours, 24)
>>> days, hours, mins, secs
(1, 10, 17, 36)

Comment on lines +357 to +367
@crash_command(
description="Display the tasks on the run queues of each cpu.",
arguments=(
argument("-t", action="store_true", dest="show_timestamps"),
argument("-T", action="store_true", dest="show_lag"),
argument("-m", action="store_true", dest="pretty_runtime"),
argument("-g", action="store_true", dest="group"),
argument("-c", type=str, default="", dest="cpus"),
drgn_argument,
),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From playing around with the runq command, it seems to me that a lot of these arguments are (thankfully) mutually exclusive, and crash just doesn't document that's the case. The only option that can be combined with the others is -c.

  • Each of -t | -T | -m are separate modes that behave similarly. They will iterate over the runqueues and print out some sort of timestamp information in various formats. They cannot be combined.
  • The alternative mode is to print the contents (i.e. tasks) in each runqueue, with -g optionally modifying the way they are printed.

It seems fair to me that we may implement each mode in a separate function, to simplify things. And we could make the mutual exclusion explicit here:

Suggested change
@crash_command(
description="Display the tasks on the run queues of each cpu.",
arguments=(
argument("-t", action="store_true", dest="show_timestamps"),
argument("-T", action="store_true", dest="show_lag"),
argument("-m", action="store_true", dest="pretty_runtime"),
argument("-g", action="store_true", dest="group"),
argument("-c", type=str, default="", dest="cpus"),
drgn_argument,
),
)
@crash_command(
description="Display the tasks on the run queues of each cpu.",
arguments=(
mutually_exclusive_group(
argument("-t", action="store_true", dest="show_timestamps"),
argument("-T", action="store_true", dest="show_lag"),
argument("-m", action="store_true", dest="pretty_runtime"),
argument("-g", action="store_true", dest="group"),
),
argument("-c", type=str, default="", dest="cpus"),
drgn_argument,
),
)

return "%d %02d:%02d:%02d.%03d" % (days, hours, mins, secs, ms)


def run_queue(prog: Program, args: argparse.Namespace) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing missing here is an implementation of the --drgn argument. The more code we refactor into reusable drgn helpers, the easier it becomes to print the equivalent drgn code. In fact, I think starting from the --drgn output may be a really good way to think about how you want to refactor the code into helpers. For instance, here's some pseudocode I've made up, which we may output for runq --drgn -c 1,3-5:

from drgn.helpers.common.parse import range_list
from drgn.helpers.linux.percpu import per_cpu
from drgn.helpers.linux.sched import task_lastrun2now

for cpu in range_list("1,3-5"):
    rq = per_cpu(prog["runqueues"], cpu)
    runtime_ns = task_lastrun2now(rq.curr)

Starting from there, gives a good idea what helpers we'd like to have, how they'd be used, and where they should be imported from.

Copy link
Owner

@osandov osandov Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again totally agree with the general advice. Your example does raise a small question: should --drgn output use the arguments given in its example output (range_list("1,3-5")) or use placeholders (range_list(cpu_list))? I opted for the latter for the mount command to make the generated code more general, but an argument could be made for the former, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight preference for inserting the arguments into the script, so that ideally the output could be redirected to a file and directly sourced or run via the drgn command without modification.

But then again, substituting an Object into the text wouldn't really make sense... You could output a constructor call (Object(prog, "struct foo *", value=0xdeadbeef)) but that doesn't seem like it would be very helpful - it wouldn't at all be applicable to other programs or situations. So considering the complications of doing this consistently, I think you're right - using a placeholder variable name seems like the best way forward. Maybe there should be a comment at the beginning, containing all the placeholders that the user should define?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe rather than a comment, we could do this, which makes it clear what's happening but also makes it easy to reassign or change:

cpu_ranges = "1,3-5"
for cpu in range_list(cpu_ranges):
    ...

And for mount:

task = Object(prog, "struct task_struct *", 0xdeadbeef)
# or
task = find_task(prog, 1)
...

This way things are still easy to source and run, but all the variable declarations based on input parameters are placed at the top, so they're easy to see and edit.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I loved this suggestion, so I modified mount to do this, and I'm also making the crash command test cases exec() the output of --drgn by default to enforce that the output is always valid and runnable.

Comment on lines +175 to +177
prio_array = (
hex(runqueue.rt.active.address_ - 16) if runqueue.rt.active.address_ else 0
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we subtract 16 here? And why wouldn't runqueue.rt.active.address_of_() work here?

@richl9
Copy link
Author

richl9 commented Jul 1, 2025

@brenns10 Thanks a lot for the detailed comments! Thera are a lot to digest. I will go over them one by one.

@osandov osandov force-pushed the commands branch 2 times, most recently from 44dc4c1 to eb2fab3 Compare July 2, 2025 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants