Skip to content
This repository was archived by the owner on Mar 3, 2023. It is now read-only.

revert the stmgr clientmgr #3492

Merged
merged 1 commit into from
Mar 25, 2020
Merged

revert the stmgr clientmgr #3492

merged 1 commit into from
Mar 25, 2020

Conversation

huijunwu
Copy link
Member

fix #3477

@huijunwu huijunwu requested review from xiaoyao1991 and nwangtw March 18, 2020 01:51
@@ -115,7 +115,7 @@ bool StMgrClientMgr::DidAnnounceBackPressure() {
return stream_manager_->DidAnnounceBackPressure();
}

shared_ptr<StMgrClient> StMgrClientMgr::CreateClient(const sp_string& _other_stmgr_id,
StMgrClient* StMgrClientMgr::CreateClient(const sp_string& _other_stmgr_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. This is strange because shared_ptr<> should be safer and I can't tell what can be wrong in the original code. @dnrusakov should have better idea.

@@ -115,7 +115,7 @@ bool StMgrClientMgr::DidAnnounceBackPressure() {
return stream_manager_->DidAnnounceBackPressure();
}

shared_ptr<StMgrClient> StMgrClientMgr::CreateClient(const sp_string& _other_stmgr_id,
StMgrClient* StMgrClientMgr::CreateClient(const sp_string& _other_stmgr_id,
Copy link
Contributor

@dnrusakov dnrusakov Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huijunw
What you have basically done is allocating some memory but never cleaning it up.
It may lead to memory leaks.
You don't observe that problem any more because now you never call a destructor for StMgrClientMgr (previously shared_ptr called the destructor of StMgrClientMgr as soon as the object is out of scope or the destructor of enclosing object is called).
To fix the original issue you need to run stream manager under Valgrind, then reproduce the crash locally, then take a look at the Valgrind log to see what is the precise stack trace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @nwangtw

Thanks.

Yeah I think you are right. Shared ptr is reference based hence this crash is quite strange. The only possible way is a bad cast somewhere that cause the ref count messed up.

Copy link
Member Author

@huijunwu huijunwu Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when a new physical plan is issued from tmaster, the stmgr-client-manager swaps the old stmgr out from array and explicitly calls quit() destructor.
btw, this pr just reverts it to v0.20.0, those lines have not been changed till v0.20.1 for 4 years and we did not see any issue during those 4 years. thus believe it is safe

@huijunwu huijunwu merged commit f76f479 into master Mar 25, 2020
sreev pushed a commit to sreev/incubator-heron that referenced this pull request Apr 9, 2020
@huijunwu huijunwu deleted the huijunw/20200317 branch July 11, 2020 09:21
nicknezis pushed a commit that referenced this pull request Sep 14, 2020
Co-authored-by: Huijun Wu <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stmgr coredump
5 participants