Skip to content

feat(ctl): add API to detect bottleneck actors #21149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 8, 2025
Merged

Conversation

KeXiangWang
Copy link
Contributor

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

Highest Release Version: 2.3

What's changed and what's your intention?

Add an API in risectl to using await tree to locate the bottleneck in streaming graphs. Currently, the API will first pull for a new await tree dump and analyze on it. Ideally, it should also be able to read from an existing diagnose file or await tree file. Will add this later.

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • My PR contains critical fixes that are necessary to be merged into the latest release.

Documentation

  • My PR needs documentation updates.
Release note

Copy link
Contributor

@kwannoel kwannoel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be better to decouple the API for fetching the await-tree json, and the API to detect bottleneck actors.

That's because our bottleneck-detecting strategy may not be perfect. So we may wish to fetch the await-tree json from the user cluster first.

Then we can iterate on the bottleneck detecting algorithm, and tweak the tool to suit the await-tree.

Besides that, great work!

@KeXiangWang KeXiangWang force-pushed the wkx/await-tree-improve branch from c94b84a to e9c79a7 Compare March 28, 2025 02:53
@KeXiangWang KeXiangWang force-pushed the wkx/await-tree-improve branch from e9c79a7 to 648ac9f Compare March 28, 2025 02:55
@kwannoel kwannoel mentioned this pull request Mar 28, 2025
Comment on lines 354 to 357
let result = client
.stack_trace(StackTraceRequest::default())
.await
.map_err(err)?;
Copy link
Member

@xxchan xxchan Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also allow specifying format here in the meta dashboard.

And can do analyze there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be better to decouple the API for fetching the await-tree json, and the API to detect bottleneck actors.
That's because our bottleneck-detecting strategy may not be perfect. So we may wish to fetch the await-tree json from the user cluster first.

@kwannoel read my mind. I choose to write this in risectl is for decoupling dumping and analyzing. But of course we can allow one to specify format through meta's API.

@xxchan
Copy link
Member

xxchan commented Mar 28, 2025

Actually I prefer a js/web based tool to analyze the bottleneck. 🤔 That would be faster to iterate and demonstrate. But it's also fine to have one in risectl.

@kwannoel
Copy link
Contributor

kwannoel commented Mar 28, 2025

Tested with this SQL (release build):

create table t(id int, v1 int);

create table t2(id int, v1 int);

insert into t select id, 1 from generate_series(1, 10000) t(id);

insert into t2 select id, 1 from generate_series(1, 10000) t(id);

flush;

create materialized view mv as select t.id, count(t.v1) from t join t2 on t.v1 = t2.v1 group by t.id;

Works well for me:

>> Actor 129
Actor 129: `mv` [21.285s]  <== current
  Epoch 8251479171792896 [!!! 21.283s]
    Materialize 8100000007 [!!! 21.283s]
      Project 8100000006 [!!! 21.273s]
        HashAgg 8100000005 [!!! 21.273s]
          Merge 8100000004 [0.001s]
[Detached 3]
  LocalInput (actor 134) [0.001s]

>> Actor 130
Actor 130: `mv` [21.285s]  <== current
  Epoch 8251479171792896 [!!! 21.284s]
    Materialize 8200000007 [!!! 21.284s]
      Project 8200000006 [!!! 21.273s]
        HashAgg 8200000005 [!!! 21.273s]
          Merge 8200000004 [0.009s]
[Detached 3]
  LocalInput (actor 134) [0.009s]

>> Actor 131
Actor 131: `mv` [21.285s]  <== current
  Epoch 8251479171792896 [!!! 21.283s]
    Materialize 8300000007 [!!! 21.283s]
      Project 8300000006 [!!! 21.279s]
        HashAgg 8300000005 [!!! 21.279s]
          Merge 8300000004 [0.007s]
[Detached 3]
  LocalInput (actor 134) [0.007s]

>> Actor 132
Actor 132: `mv` [21.285s]
  Epoch 8251479171792896 [!!! 21.283s]
    Materialize 8400000007 [!!! 21.283s]
      Project 8400000006 [!!! 21.280s]
        HashAgg 8400000005 [!!! 21.280s]
          Merge 8400000004 [0.000s]  <== current

@kwannoel
Copy link
Contributor

Ideally, it should also be able to read from an existing diagnose file or await tree file.

+1 for this, it takes a long time for users to upgrade.

@KeXiangWang KeXiangWang marked this pull request as ready for review March 29, 2025 17:38
@graphite-app graphite-app bot requested a review from a team March 29, 2025 17:56
Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work! Mainly just nit picking.

What about we first get the changes to get JSON format merged (maybe can also cherry-picked). And refine the analyzer part later?

Comment on lines 88 to 94
if req.actor_traces_format.is_some()
&& req.actor_traces_format.as_ref().unwrap() == "text"
{
v.to_string()
} else {
serde_json::to_string(&v).unwrap()
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is different from the comment: "// Allowed values: "json", "text". Default is "text"."

Here it defaults to json.

Maybe better to define an enum instead of using string

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

use crate::cmd_impl::await_tree::tree::TreeView;
use crate::cmd_impl::await_tree::utils::extract_actor_traces;

pub fn transcribe(path: String) -> anyhow::Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that this functionality can be provided by the library itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with that.

/// the bottleneck actor is still yielding output to downstream actors. A typical
/// case is JOIN amplification. So the corresponding actors are actively processing
/// the data but the EPOCH span is blocked.
pub fn is_bottleneck(&self) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, the algorithm looks much easier than I thought initially. So there's no need to traverse between actors (based on the Input and Output spans) at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I'm not entirely sure this algorithm works in every situation. However, in most cases where there's only one bottleneck actor, it should perform as expected. I can think of a few edge cases:

  1. If the bottleneck's span doesn't call instrument_await, the bottleneck's tree might not exhibit the typical "slow parent with fast children" signal.
  2. If an MV or the entire graph has multiple fragments acting as bottlenecks, I'm uncertain what the outcome would be.

At present, this tool doesn't completely eliminate manual effort. It definitely requires ongoing optimization and the addition of new rules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2. I'm uncertain what the outcome would be.

Shall we output all of them, ordering by some sort of confidence or probablity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tend to leave it for future. At present, we don't have too much experience dealing with these edge cases. We can add some new rules when we can summarize some tips for them. For now, if the current bottleneck_detect return nothing, the on-caller should just use transcribe to convert the json await tree to text and mannually analyse the text await tree.

@KeXiangWang KeXiangWang requested a review from a team as a code owner April 2, 2025 22:33
@KeXiangWang KeXiangWang requested a review from cyliu0 April 2, 2025 22:33
@@ -31,6 +31,7 @@ risingwave_pb = { workspace = true }
risingwave_rpc_client = { workspace = true }
risingwave_storage = { workspace = true }
risingwave_stream = { workspace = true }
rw-diagnose-tools = { git = "https://github.com/risingwavelabs/rw-diagnose-tools" }
Copy link
Contributor Author

@KeXiangWang KeXiangWang Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use github url for now. Need @BugenZhao 's help to publish the repo in crates.io.

Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others LGTM. Thanks for the work

Comment on lines 88 to 94
if req.actor_traces_format.is_some()
&& req.actor_traces_format.as_ref().unwrap() == "text"
{
v.to_string()
} else {
serde_json::to_string(&v).unwrap()
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not resolved

let result = client.stack_trace().await.map_err(err)?;
let result = client
.stack_trace(StackTraceRequest {
actor_traces_format: Some("text".to_owned()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a parameter to the meta dashboard /await_tree call instead. Can default to text

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Example of a request:

curl "http://localhost:5691/api/monitor/await_tree/?format=text" > diagnose.txt

&& let Ok(result) = client.stack_trace().await
&& let Ok(result) = client
.stack_trace(StackTraceRequest {
actor_traces_format: Some("json".to_owned()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, please add a parameter to the /diagnose API instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example:

curl http://localhost:5691/api/monitor/diagnose/\?actor_traces_format\=json > diagnose.txt

Copy link
Contributor

@kwannoel kwannoel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After address @xxchan comments, rest LGTM. Thanks for the work!

@KeXiangWang KeXiangWang force-pushed the wkx/await-tree-improve branch from ca3a490 to cb52394 Compare April 4, 2025 00:11
@KeXiangWang KeXiangWang force-pushed the wkx/await-tree-improve branch from cb52394 to 7b581ad Compare April 4, 2025 00:12
Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, Thanks! Should we cherry-pick to earlier releases?

@KeXiangWang KeXiangWang added the need-cherry-pick-release-2.3 [⚠️DEPRECATED] Use `..-since-release-..` instead label Apr 7, 2025
Copy link
Contributor

github-actions bot commented Apr 7, 2025

Warning

The need-cherry-pick-release-xx label is deprecated. Please use need-cherry-pick-since-release-xx instead for future PRs.

Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@KeXiangWang KeXiangWang added this pull request to the merge queue Apr 8, 2025
Merged via the queue into main with commit f09d594 Apr 8, 2025
33 of 34 checks passed
@KeXiangWang KeXiangWang deleted the wkx/await-tree-improve branch April 8, 2025 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants