Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

session->LoadMessageEventBuilder Destructor SIGSEV different thread #426

Closed
1 task done
kclonts opened this issue Sep 27, 2024 · 11 comments
Closed
1 task done

session->LoadMessageEventBuilder Destructor SIGSEV different thread #426

kclonts opened this issue Sep 27, 2024 · 11 comments
Assignees
Labels
A-Client Area: C++ SDK bug Something isn't working

Comments

@kclonts
Copy link

kclonts commented Sep 27, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

// thread 1
Session* sess = new Session();

// thread 2
{
MessageEventBuilder b;
session->LoadMessageEventBuilder(&b);
}

SIGSEV on destructor from g_newDeleteAllocatorSingleton

Both Async & Sync mode

Documentation states all Session methods are thread enabled

Expected Behavior

no sigsev; MessageEventBuilder is valid

Steps To Reproduce

see description

BlazingMQ Version

0.92.10, 0.92.6, latest

Anything else?

No response

@kclonts kclonts added the bug Something isn't working label Sep 27, 2024
@pniedzielski pniedzielski added the A-Client Area: C++ SDK label Sep 27, 2024
@pniedzielski
Copy link
Collaborator

Hi @kclonts,

Thanks for your report! It seems like a bug, but just a little bit more information:

  1. What platform are you on? OSX version, Linux, architecture, etc
  2. Can you give us a minimal code example (.cpp file) that shows this?

@kclonts
Copy link
Author

kclonts commented Oct 1, 2024

Rocky Linux 9 (up to date) X86_64. 7970wx

Clang 18.1.8, debug and release builds, w/ & w/o TSAN respectively

Updating (after coming up with my workaround, a single-thread on the BMQ start session thread guarded w/ mutex to access builders), the confirm event builder experiences the same issue

// Needs Nothing more, nothing less. Many variations on adding message, reset, etc.
void bldrEvntThreadFailure(BloombergLP::bmqa::Session* sess) {
	BloombergLP::bmqa::MessageEventBuilder msgBldr;
	sess->loadMessageEventBuilder(&msgBldr);
	// on dtor throws SIGSEV
}

void bldrConfThreadFailure(BloombergLP::bmqa::Session* sess) {
	BloombergLP::bmqa::ConfirmEventBuilder confBldr;
	sess->loadConfirmEventBuilder(&confBldr);
	// on dtro throws SIGSEV
}

void main() {
	auto opts = BloombergLP::bmqt::SessionOptions();
	opts.setBrokerUri(std::getenv("BMQ_URI"));
	opts.setNumProcessingThreads(4);
	auto sess = new BloombergLP::bmqa::Session(opts);

	int rc = sess->start();
	if (rc != 0) {
		// out-of-scope to example
		exit(1);
	}

	// Fails w/ Event Builder
	auto t1 = std::thread(&bldrEvntThreadFailure, sess);
	t1.join();

	// Fails w/ Confirm Event Builder
	auto t2 = std::thread(&bldrConfThreadFailure, sess);
	t2.join();
}

@pniedzielski
Copy link
Collaborator

Thanks! As far as we can tell, this code should work, so we'll work on reproducing this.

@dorjesinpo
Copy link
Collaborator

You have to call session stop before the exit. Without that, session (internal) threads may use allocators that have destructed.

@dorjesinpo
Copy link
Collaborator

..also, sess leaks

@pniedzielski
Copy link
Collaborator

I'm able to reproduce this issue with and without calling stop and deleting sess. For this backtrace I also added pthread thread names to the std::thread we're creating to distinguish. Error is indeed at end of scope on t1.

(lldb) thread backtrace all
thread backtrace all
  thread #1, queue = 'com.apple.main-thread'
    frame #0: 0x000000018817ab8c libsystem_kernel.dylib`__ulock_wait + 8
    frame #1: 0x00000001881bc48c libsystem_pthread.dylib`_pthread_join + 608
    frame #2: 0x00000001880f1aa8 libc++.1.dylib`std::__1::thread::join() + 36
    frame #3: 0x00000001006e0c08 a.out`main + 336
    frame #4: 0x0000000187e2f154 dyld`start + 2476
  thread #2, name = 'bmqFSMEvtQ'
    frame #0: 0x000000018817c5cc libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001881ba55c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x000000010033d138 a.out`BloombergLP::bslmt::ConditionImpl<BloombergLP::bslmt::Platform::PosixThreads>::wait(this=0x000000013200e090, mutex=0x000000013200e050) at bslmt_conditionimpl_pthread.h:188:12
    frame #3: 0x000000010033c490 a.out`BloombergLP::bslmt::Condition::wait(this=0x000000013200e090, mutex=0x000000013200e050) at bslmt_condition.h:380:18
    frame #4: 0x0000000100546ef8 a.out`BloombergLP::bdlcc::SingleConsumerQueueImpl<bsl::shared_ptr<BloombergLP::bmqimp::Event>, BloombergLP::bsls::AtomicOperations, BloombergLP::bslmt::Mutex, BloombergLP::bslmt::Condition>::popFront(this=0x000000013200e040, value=0x000000016fe86e88) at bdlcc_singleconsumerqueueimpl.h:961:37
    frame #5: 0x00000001004c3230 a.out`BloombergLP::bdlcc::SingleConsumerQueue<bsl::shared_ptr<BloombergLP::bmqimp::Event>>::popFront(this=0x000000013200e040, value=0x000000016fe86e88) at bdlcc_singleconsumerqueue.h:369:19
    frame #6: 0x00000001004c2c68 a.out`BloombergLP::bmqimp::BrokerSession::fsmThreadLoop(this=0x000000013200d510) at bmqimp_brokersession.cpp:3178:40
    frame #7: 0x00000001005c7d2c a.out`BloombergLP::bdlf::MemFnInstance<void (BloombergLP::bmqimp::BrokerSession::*)(), BloombergLP::bmqimp::BrokerSession*>::operator()(this=0x00006000031384f8) const at bdlf_memfn.h:1025:12
    frame #8: 0x00000001005c7bd0 a.out`BloombergLP::bslmt::EntryPointFunctorAdapter<BloombergLP::bdlf::MemFnInstance<void (BloombergLP::bmqimp::BrokerSession::*)(), BloombergLP::bmqimp::BrokerSession*>>::invokerFunction(adapterRaw=0x00006000031384f0) at bslmt_entrypointfunctoradapter.h:313:5
    frame #9: 0x00000001000be7d0 a.out`bslmt_EntryPointFunctorAdapter_invoker + 16
    frame #10: 0x00000001881b9f94 libsystem_pthread.dylib`_pthread_start + 136
  thread #3, name = 'bmqScheduler'
    frame #0: 0x000000018817c5cc libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001881ba55c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x00000001000be600 a.out`BloombergLP::bslmt::ConditionImpl<BloombergLP::bslmt::Platform::PosixThreads>::timedWait(BloombergLP::bslmt::Mutex*, BloombergLP::bsls::TimeInterval const&) + 212
    frame #3: 0x00000001001288ac a.out`BloombergLP::bdlmt::EventScheduler::dispatchEvents() + 484
    frame #4: 0x000000010012e9fc a.out`BloombergLP::bslmt::EntryPointFunctorAdapter<BloombergLP::bdlf::Bind<BloombergLP::bslmf::Nil, void (BloombergLP::bdlmt::EventScheduler::*)(), BloombergLP::bdlf::Bind_BoundTuple1<BloombergLP::bdlmt::EventScheduler*>>>::invokerFunction(void*) + 96
    frame #5: 0x00000001000be7d0 a.out`bslmt_EntryPointFunctorAdapter_invoker + 16
    frame #6: 0x00000001881b9f94 libsystem_pthread.dylib`_pthread_start + 136
  thread #4, name = 'bmqimp-0'
    frame #0: 0x000000018817ef20 libsystem_kernel.dylib`kevent + 8
    frame #1: 0x00000001002cf868 a.out`BloombergLP::ntco::Kqueue::run(void*) + 316
    frame #2: 0x00000001002697e0 a.out`BloombergLP::ntcr::Interface::run(void*) + 792
    frame #3: 0x00000001881b9f94 libsystem_pthread.dylib`_pthread_start + 136
  thread #5, name = 'dns-system'
    frame #0: 0x000000018817c5cc libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001881ba55c libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x00000001000be600 a.out`BloombergLP::bslmt::ConditionImpl<BloombergLP::bslmt::Platform::PosixThreads>::timedWait(BloombergLP::bslmt::Mutex*, BloombergLP::bsls::TimeInterval const&) + 212
    frame #3: 0x0000000100131058 a.out`BloombergLP::bdlmt::ThreadPool::workerThread() + 472
    frame #4: 0x0000000100130e74 a.out`ThreadPoolEntry + 12
    frame #5: 0x00000001000c0dbc a.out`bslmt_threadutil_namedFuncPtrThunk + 140
    frame #6: 0x00000001881b9f94 libsystem_pthread.dylib`_pthread_start + 136
* thread #6, name = 't1', stop reason = signal SIGABRT
  * frame #0: 0x00000001881815d0 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x00000001881b9c20 libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x00000001880c6ac4 libsystem_c.dylib`__abort + 136
    frame #3: 0x00000001880b825c libsystem_c.dylib`__stack_chk_fail + 96
    frame #4: 0x00000001006e09b8 a.out`bldrEvntThreadFailure(BloombergLP::bmqa::Session*) + 124
    frame #5: 0x0000000100892f88 a.out`BloombergLP::g_newDeleteAllocatorSingleton_p (.0) + 8

@pniedzielski
Copy link
Collaborator

@kclonts
It seems like the error isn't quite related to thread-safety, but using threads let us see it more readily. The underlying error is that our build scripts compile with a BMQ_ENABLE_MSG_GROUPID macro set, whose presence changes the size of a few key types, notably the bmqa::Message inside the builders. Message Group IDs are a feature that we added to the protocol, but do not have plans to add to the client yet. However, it looks like our build system is setting it incorrectly..

I think the immediate fix to your problem is compiling your program with -DBMQ_ENABLE_MSG_GROUPID. Could you try this and confirm that I've reproduced the same error? If that's the case, we'll have some work to do with our build system.

@kclonts
Copy link
Author

kclonts commented Oct 1, 2024

@pniedzielski

Setting at the CLI running CMake, I get this

CMake Warning:
  Manually-specified variables were not used by the project:

    BMQ_ENABLE_MSG_GROUPID

And using the new library, the issue is not resolved

@kclonts
Copy link
Author

kclonts commented Oct 2, 2024

OK! Looked at your PR, I realized what I needed to do here to get this working

Enabling add_definitions("-DBMQ_ENABLE_MSG_GROUPID") in my CMake for my project, it enables the correct definition for the imported header file bmqa_messages.h and thus it all compiles runs successfully

I really appreciate the help and quick turnaround here

@pniedzielski
Copy link
Collaborator

Excellent, great to hear! It was a bug that this flag was ever on, so we're going to turn it off by default in our next release. This will break that work-around for you, but it's really where we should have been originally. You'll be able to build at that point without -DBMQ_ENABLE_MSG_GROUPID and all should be well.

I'll keep this issue open and close it when we merge that change into main, just to keep you updated.

@pniedzielski
Copy link
Collaborator

Change has been merged into main, so you should be able to build without -DBMQ_ENABLE_MSG_GROUPID now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Client Area: C++ SDK bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants