Skip to content

config-linux: add intelRdt.enableMonitoring #1287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 29, 2025

Conversation

marquiz
Copy link
Contributor

@marquiz marquiz commented Jun 3, 2025

Add a parameter for enabling per-container resctrl monitoring.

This supersedes the previous enableCMT and enableMBM settings whose functionality was very vaguely specified. Separate parameter for every monitoring metric does not seem to make much sense, in particular because in the resctrl filesystem it is not possible to selectively enable a subset of the monitoring features. You always get all the metrics that the system provides. Also, with separate settings (and corresponding check if the specific metric is available) the user cannot specify "enable whatever is available" - setting everything to "true" might fail because one of the metrics is not available on the platform. In addition, having separate parameters is very future-unproof, making support for new monitoring metrics unnecessarily cumbersome to add. New metrics are certain to be added in new hardware generations, e.g. perf/energy monitoring in the near future (https://lkml.org/lkml/2025/5/21/1631), and requiring an update to the runtime-spec for each one of them feels like an overkill without much benefits. It is easier to have one switch for "enable container-specific metrics" and let the user read whatever metrics the platform provides.

Moreover, it is not even possible to turn off monitoring (from the resctrl filesystem). For example, you always get the metrics for all CTRL_MON groups (closIDs). However, that is not always very useful as there likely are a lot of applications packed in the same group. The new intelRdt.enableMontoring parameter will enable creation of a MON group specific to a single container allowing monitoring of resctrl metrics on per-container granularity.

The intelRdt.enableCMT and intelRdt.enableMBM are removed from the spec. They are vaguely specified, problematic and have no known implementations.

NOTE: I checked all the runtimes listed in implementeations.md and none of them have implemented the enableCMT and enableMBM so I wonder if there would be an option to just drop those(?)

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is enableMonitoring == enableCMT || EnableMBM?

@marquiz
Copy link
Contributor Author

marquiz commented Jun 3, 2025

enableMonitoring == enableCMT || EnableMBM?

Not quite. It's not specified in the spec what should happen if enableCMT=true but the platform does not support CMT.

EDIT: so I think it probably almost is but nobody knows, I think :/

EDIT2: and of course there are the future cases, e.g. if the upcoming Perf/Energy monitoring is available but both CMT and MBM unavailable

@marquiz marquiz changed the title config-linux: add intelRdt.enableMontoring config-linux: add intelRdt.enableMonitoring Jun 3, 2025
Copy link
Contributor

@kad kad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
And if there is no existing implementations that are using old/deprecated flags, I'd propose to drop them completely, as it is proven to be impractical over all those years.

@giuseppe
Copy link
Member

giuseppe commented Jun 3, 2025

And if there is no existing implementations that are using old/deprecated flags, I'd propose to drop them completely, as it is proven to be impractical over all those years.

+1 from crun, they are not implemented/used.

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@marquiz
Copy link
Contributor Author

marquiz commented Jun 4, 2025

+1 from crun, they are not implemented/used.

Check. I added a separate patch that drops these fields (this could be of course debated in a separate PR, too)

@marquiz
Copy link
Contributor Author

marquiz commented Jun 5, 2025

@AkihiroSuda @utam0k PTAL

for the container.

* **`enableMBM`** *(boolean, OPTIONAL)* - specifies if Intel RDT MBM should be enabled:
* MBM (Memory Bandwidth Monitoring) supports monitoring of total and local memory bandwidth
Copy link
Member

@AkihiroSuda AkihiroSuda Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The history of this technically-breaking change should be documented as a note?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks @AkihiroSuda, I think that makes sense (I was thinking something along the lines I described in the commit message(s)). Any suggestion where to put that note?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts @AkihiroSuda? Or would it be better to split the removal of this fields into a separate PR(?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit messages are practically invisible to readers, so it should be just written in the markdown file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I now added a NOTE to config-linux.md (in the IntelRdt section).

@AkihiroSuda PTAL

@AkihiroSuda AkihiroSuda added this to the v1.3.0 milestone Jun 19, 2025
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please squash the commits

Add a parameter for enabling per-container resctrl monitoring.

This supersedes and replaces the previous "enableCMT" and "enableMBM"
settings whose functionality was very vaguely specified. Separate
parameter for every monitoring metric does not seem to make much sense, in
particular because in the resctrl filesystem it is not possible to
selectively enable a subset of the monitoring features. You always get
all the metrics that the system provides. Also, with separate settings
(and corresponding check if the specific metric is available) the user
cannot specify "enable whatever is available" - setting everything to
"true" might fail because one of the metrics is not available on the
platform. In addition, having separate parameters is very
future-unproof, making support for new monitoring metrics unnecessarily
cumbersome to add. New metrics are certain to be added in new hardware
generations, e.g. perf/energy monitoring in the near future
(https://lkml.org/lkml/2025/5/21/1631), and requiring an update to the
runtime-spec for each one of them feels like an overkill without much
benefits. It is easier to have one switch for "enable container-specific
metrics" and let the user read whatever metrics the platform provides.

Moreover, it is not even possible to turn off monitoring (from the
resctrl filesystem). For example, you always get the metrics for all
CTRL_MON groups (closIDs). However, that is not always very useful as
there likely are a lot of applications packed in the same group. The new
intelRdt.enableMontoring parameter will enable creation of a MON group
specific to a single container allowing monitoring of resctrl metrics on
per-container granularity.

Signed-off-by: Markus Lehtonen <[email protected]>
@marquiz
Copy link
Contributor Author

marquiz commented Jun 25, 2025

LGTM, but please squash the commits

Done

Copy link
Member

@utam0k utam0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@utam0k utam0k merged commit 34a39b9 into opencontainers:main Jun 29, 2025
4 checks passed
@marquiz marquiz deleted the devel/rdt branch June 30, 2025 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants