-
Notifications
You must be signed in to change notification settings - Fork 579
config-linux: add intelRdt.enableMonitoring #1287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is enableMonitoring == enableCMT || EnableMBM
?
Not quite. It's not specified in the spec what should happen if EDIT: so I think it probably almost is but nobody knows, I think :/ EDIT2: and of course there are the future cases, e.g. if the upcoming Perf/Energy monitoring is available but both CMT and MBM unavailable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
And if there is no existing implementations that are using old/deprecated flags, I'd propose to drop them completely, as it is proven to be impractical over all those years.
+1 from crun, they are not implemented/used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Check. I added a separate patch that drops these fields (this could be of course debated in a separate PR, too) |
@AkihiroSuda @utam0k PTAL |
for the container. | ||
|
||
* **`enableMBM`** *(boolean, OPTIONAL)* - specifies if Intel RDT MBM should be enabled: | ||
* MBM (Memory Bandwidth Monitoring) supports monitoring of total and local memory bandwidth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The history of this technically-breaking change should be documented as a note?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thanks @AkihiroSuda, I think that makes sense (I was thinking something along the lines I described in the commit message(s)). Any suggestion where to put that note?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts @AkihiroSuda? Or would it be better to split the removal of this fields into a separate PR(?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit messages are practically invisible to readers, so it should be just written in the markdown file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I now added a NOTE to config-linux.md (in the IntelRdt section).
@AkihiroSuda PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please squash the commits
Add a parameter for enabling per-container resctrl monitoring. This supersedes and replaces the previous "enableCMT" and "enableMBM" settings whose functionality was very vaguely specified. Separate parameter for every monitoring metric does not seem to make much sense, in particular because in the resctrl filesystem it is not possible to selectively enable a subset of the monitoring features. You always get all the metrics that the system provides. Also, with separate settings (and corresponding check if the specific metric is available) the user cannot specify "enable whatever is available" - setting everything to "true" might fail because one of the metrics is not available on the platform. In addition, having separate parameters is very future-unproof, making support for new monitoring metrics unnecessarily cumbersome to add. New metrics are certain to be added in new hardware generations, e.g. perf/energy monitoring in the near future (https://lkml.org/lkml/2025/5/21/1631), and requiring an update to the runtime-spec for each one of them feels like an overkill without much benefits. It is easier to have one switch for "enable container-specific metrics" and let the user read whatever metrics the platform provides. Moreover, it is not even possible to turn off monitoring (from the resctrl filesystem). For example, you always get the metrics for all CTRL_MON groups (closIDs). However, that is not always very useful as there likely are a lot of applications packed in the same group. The new intelRdt.enableMontoring parameter will enable creation of a MON group specific to a single container allowing monitoring of resctrl metrics on per-container granularity. Signed-off-by: Markus Lehtonen <[email protected]>
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Add a parameter for enabling per-container resctrl monitoring.
This supersedes the previous
enableCMT
andenableMBM
settings whose functionality was very vaguely specified. Separate parameter for every monitoring metric does not seem to make much sense, in particular because in the resctrl filesystem it is not possible to selectively enable a subset of the monitoring features. You always get all the metrics that the system provides. Also, with separate settings (and corresponding check if the specific metric is available) the user cannot specify "enable whatever is available" - setting everything to "true" might fail because one of the metrics is not available on the platform. In addition, having separate parameters is very future-unproof, making support for new monitoring metrics unnecessarily cumbersome to add. New metrics are certain to be added in new hardware generations, e.g. perf/energy monitoring in the near future (https://lkml.org/lkml/2025/5/21/1631), and requiring an update to the runtime-spec for each one of them feels like an overkill without much benefits. It is easier to have one switch for "enable container-specific metrics" and let the user read whatever metrics the platform provides.Moreover, it is not even possible to turn off monitoring (from the resctrl filesystem). For example, you always get the metrics for all CTRL_MON groups (closIDs). However, that is not always very useful as there likely are a lot of applications packed in the same group. The new intelRdt.enableMontoring parameter will enable creation of a MON group specific to a single container allowing monitoring of resctrl metrics on per-container granularity.
The
intelRdt.enableCMT
andintelRdt.enableMBM
are removed from the spec. They are vaguely specified, problematic and have no known implementations.NOTE: I checked all the runtimes listed in
implementeations.md
and none of them have implemented theenableCMT
andenableMBM
so I wonder if there would be an option to just drop those(?)