You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-35714][FOLLOW-UP][CORE] WorkerWatcher should run System.exit in a thread out of RpcEnv
### What changes were proposed in this pull request?
This PR proposes to let `WorkerWatcher` run `System.exit` in a separate thread instead of some thread of `RpcEnv`.
### Why are the changes needed?
`System.exit` will trigger the shutdown hook to run `executor.stop`, which will result in the same deadlock issue with SPARK-14180. But note that since Spark upgrades to Hadoop 3 recently, each hook now will have a [timeout threshold](https://github.com/apache/hadoop/blob/d4794dd3b2ba365a9d95ad6aafcf43a1ea40f777/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ShutdownHookManager.java#L205-L209) which forcibly interrupt the hook execution once reaches timeout. So, the deadlock issue doesn't really exist in the master branch. However, it's still critical for previous releases and is a wrong behavior that should be fixed.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Tested manually.
Closes#35069 from Ngone51/fix-workerwatcher-exit.
Authored-by: yi.wu <[email protected]>
Signed-off-by: yi.wu <[email protected]>
(cherry picked from commit 639d6f4)
Signed-off-by: yi.wu <[email protected]>
0 commit comments