Skip to content

[fix] [broker] broker log a full thread dump when a deadlock is detected in healthcheck every time #22916

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 20, 2024

Conversation

yyj8
Copy link
Contributor

@yyj8 yyj8 commented Jun 15, 2024

Fixes #22915

Motivation

Broker log a full thread dump when a deadlock is detected in healthcheck every time.

Our expectation is:
First detection of deadlock printing full thread dump, then printing at the interval between parameter settings in

// org.apache.pulsar.broker.admin.impl.BrokersBase.java
private static final long LOG_THREADDUMP_INTERVAL_WHEN_DEADLOCK_DETECTED = 600000L;

Modifications

  1. class:org.apache.pulsar.broker.admin.impl.BrokersBase.java Variable threadDumpLoggedTimestamp decorated with static keyword. Avoid initializing parameters with a value of 0 every time the interface is called.

  2. Method checkDeadlockedThreads comparison before and after modification:

(1) before:

private void checkDeadlockedThreads() {
        ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
        long[] threadIds = threadBean.findDeadlockedThreads();
        if (threadIds != null && threadIds.length > 0) {
            ThreadInfo[] threadInfos = threadBean.getThreadInfo(threadIds, false, false);
            String threadNames = Arrays.stream(threadInfos)
                    .map(threadInfo -> threadInfo.getThreadName() + "(tid=" + threadInfo.getThreadId() + ")").collect(
                            Collectors.joining(", "));
            if (System.currentTimeMillis() - threadDumpLoggedTimestamp
                    > LOG_THREADDUMP_INTERVAL_WHEN_DEADLOCK_DETECTED) {
                threadDumpLoggedTimestamp = System.currentTimeMillis();
                LOG.error("Deadlocked threads detected. {}\n{}", threadNames,
                        ThreadDumpUtil.buildThreadDiagnosticString());
            } else {
                LOG.error("Deadlocked threads detected. {}", threadNames);
            }
            throw new IllegalStateException("Deadlocked threads detected. " + threadNames);
        }
    }

(2)after:

private void checkDeadlockedThreads() {
        ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
        long[] threadIds = threadBean.findDeadlockedThreads();
        if (threadIds != null && threadIds.length > 0) {
            ThreadInfo[] threadInfos = threadBean.getThreadInfo(threadIds, false, false);
            String threadNames = Arrays.stream(threadInfos)
                    .map(threadInfo -> threadInfo.getThreadName() + "(tid=" + threadInfo.getThreadId() + ")").collect(
                            Collectors.joining(", "));
            if ((System.currentTimeMillis() - threadDumpLoggedTimestamp
                    > LOG_THREADDUMP_INTERVAL_WHEN_DEADLOCK_DETECTED) ||
                    threadDumpLoggedTimestamp == 0) {
                threadDumpLoggedTimestamp = System.currentTimeMillis();
                LOG.error("Deadlocked threads detected. {}\n{}", threadNames,
                        ThreadDumpUtil.buildThreadDiagnosticString());
            } else {
                LOG.error("Deadlocked threads detected. {}", threadNames);
            }
            throw new IllegalStateException("Deadlocked threads detected. " + threadNames);
        }
    }

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:
yyj8#9

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jun 15, 2024
@yyj8 yyj8 requested a review from hanmz June 18, 2024 01:49
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lhotari lhotari merged commit ca64505 into apache:master Jun 20, 2024
50 of 54 checks passed
lhotari pushed a commit that referenced this pull request Jun 20, 2024
…ted in healthcheck every time (#22916)

(cherry picked from commit ca64505)
lhotari pushed a commit that referenced this pull request Jun 20, 2024
…ted in healthcheck every time (#22916)

(cherry picked from commit ca64505)
lhotari pushed a commit that referenced this pull request Jun 20, 2024
…ted in healthcheck every time (#22916)

(cherry picked from commit ca64505)
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Jun 21, 2024
…ted in healthcheck every time (apache#22916)

(cherry picked from commit ca64505)
(cherry picked from commit c9de1bb)
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Jun 24, 2024
…ted in healthcheck every time (apache#22916)

(cherry picked from commit ca64505)
(cherry picked from commit c9de1bb)
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Jun 25, 2024
…ted in healthcheck every time (apache#22916)

(cherry picked from commit ca64505)
(cherry picked from commit c9de1bb)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 1, 2024
…ted in healthcheck every time (apache#22916)

(cherry picked from commit ca64505)
(cherry picked from commit c9de1bb)
@lhotari lhotari added this to the 4.0.0 milestone Oct 14, 2024
hanmz pushed a commit to hanmz/pulsar that referenced this pull request Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] [broker] broker log a full thread dump when a deadlock is detected in healthcheck every time
3 participants