Skip to content
This repository was archived by the owner on Mar 3, 2023. It is now read-only.

revert the stmgr clientmgr #3492

Merged
merged 1 commit into from
Mar 25, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions heron/stmgr/src/cpp/manager/stmgr-clientmgr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ bool StMgrClientMgr::DidAnnounceBackPressure() {
return stream_manager_->DidAnnounceBackPressure();
}

shared_ptr<StMgrClient> StMgrClientMgr::CreateClient(const sp_string& _other_stmgr_id,
StMgrClient* StMgrClientMgr::CreateClient(const sp_string& _other_stmgr_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. This is strange because shared_ptr<> should be safer and I can't tell what can be wrong in the original code. @dnrusakov should have better idea.

Copy link
Contributor

@dnrusakov dnrusakov Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huijunw
What you have basically done is allocating some memory but never cleaning it up.
It may lead to memory leaks.
You don't observe that problem any more because now you never call a destructor for StMgrClientMgr (previously shared_ptr called the destructor of StMgrClientMgr as soon as the object is out of scope or the destructor of enclosing object is called).
To fix the original issue you need to run stream manager under Valgrind, then reproduce the crash locally, then take a look at the Valgrind log to see what is the precise stack trace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @nwangtw

Thanks.

Yeah I think you are right. Shared ptr is reference based hence this crash is quite strange. The only possible way is a bad cast somewhere that cause the ref count messed up.

Copy link
Member Author

@huijunwu huijunwu Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when a new physical plan is issued from tmaster, the stmgr-client-manager swaps the old stmgr out from array and explicitly calls quit() destructor.
btw, this pr just reverts it to v0.20.0, those lines have not been changed till v0.20.1 for 4 years and we did not see any issue during those 4 years. thus believe it is safe

const sp_string& _hostname, sp_int32 _port) {
stmgr_clientmgr_metrics_->scope(METRIC_STMGR_NEW_CONNECTIONS)->incr();
NetworkOptions options;
Expand All @@ -126,7 +126,7 @@ shared_ptr<StMgrClient> StMgrClientMgr::CreateClient(const sp_string& _other_stm
options.set_high_watermark(high_watermark_);
options.set_low_watermark(low_watermark_);
options.set_socket_family(PF_INET);
auto client = make_shared<StMgrClient>(eventLoop_, options, topology_name_, topology_id_,
StMgrClient* client = new StMgrClient(eventLoop_, options, topology_name_, topology_id_,
stmgr_id_, _other_stmgr_id, this, metrics_manager_client_,
droptuples_upon_backpressure_);
client->Start();
Expand Down
4 changes: 2 additions & 2 deletions heron/stmgr/src/cpp/manager/stmgr-clientmgr.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,11 +79,11 @@ class StMgrClientMgr {
virtual bool AllStMgrClientsRegistered();

private:
shared_ptr<StMgrClient> CreateClient(const sp_string& _other_stmgr_id,
StMgrClient* CreateClient(const sp_string& _other_stmgr_id,
const sp_string& _host_name, sp_int32 _port);

// map of stmgrid to its client
std::unordered_map<sp_string, shared_ptr<StMgrClient>> clients_;
std::unordered_map<sp_string, StMgrClient*> clients_;

sp_string topology_name_;
sp_string topology_id_;
Expand Down