Skip to content

Exceptions not being properly caught on OS X #203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
poulson opened this issue Nov 26, 2016 · 12 comments
Closed

Exceptions not being properly caught on OS X #203

poulson opened this issue Nov 26, 2016 · 12 comments

Comments

@poulson
Copy link
Member

poulson commented Nov 26, 2016

It seems that exceptions are not being properly handled on OS X, as I see errors of the following form when running the examples/lapack_like/Hilbert driver, which is expected to throw exceptions when attempting to run Cholesky on the Hilbert matrix with float and double:

localhost:build-llvm-git-mpich-Release-64 poulson$ lldb ./bin/examples/lapack_like/Hilbert 
(lldb) target create "./bin/examples/lapack_like/Hilbert"
Current executable set to './bin/examples/lapack_like/Hilbert' (x86_64).
(lldb) run
Process 37911 launched: './bin/examples/lapack_like/Hilbert' (x86_64)
Attempting to solve Hilbert system with float
Process 37911 stopped
* thread #1: tid = 0x6a8148, 0x00007fffb18bedda libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x00007fffb18bedda libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
->  0x7fffb18bedda <+10>: jae    0x7fffb18bede4            ; <+20>
    0x7fffb18beddc <+12>: movq   %rax, %rdi
    0x7fffb18beddf <+15>: jmp    0x7fffb18b7d6f            ; cerror_nocancel
    0x7fffb18bede4 <+20>: retq
(lldb) bt
* thread #1: tid = 0x6cb9dc, 0x00007fffb18bedda libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007fffb18bedda libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fffb19a9797 libsystem_pthread.dylib`pthread_kill + 90
    frame #2: 0x00007fffb1824440 libsystem_c.dylib`abort + 129
    frame #3: 0x0000000104421ce1 libgcc_s.1.dylib`uw_init_context_1 + 353
    frame #4: 0x0000000104422567 libgcc_s.1.dylib`_Unwind_Resume + 55
    frame #5: 0x000000010014d10e libEl.0.dylib`void El::LogicError<char [26]>(char const (&) [26]) + 462
    frame #6: 0x000000010122febc libEl.0.dylib`void El::cholesky::LowerVariant3Unblocked<float>(El::Matrix<float>&) + 268
    frame #7: 0x0000000101216419 libEl.0.dylib`void El::cholesky::LowerVariant3Blocked<float>(El::Matrix<float>&) + 217
    frame #8: 0x000000010168fc2e libEl.0.dylib`void El::hpd_solve::Overwrite<float>(El::UpperOrLowerNS::UpperOrLower, El::OrientationNS::Orientation, El::Matrix<float>&, El::Matrix<float>&) + 30
    frame #9: 0x0000000100005236 Hilbert`void SolveHilbert<float>(long long, long long, bool) + 310
    frame #10: 0x0000000100004d86 Hilbert`main + 614
    frame #11: 0x00007fffb1790255 libdyld.dylib`start + 1
    frame #12: 0x00007fffb1790255 libdyld.dylib`start + 1
@poulson
Copy link
Member Author

poulson commented Nov 26, 2016

There is now a Minimum Reproducible Example:

#include <iostream>
#include <sstream>

namespace NotEl {

inline void BuildStream( std::ostringstream& os ) { }

template<typename T,typename... ArgPack>
void BuildStream
( std::ostringstream& os, const T& item, const ArgPack& ... args )
{
    os << item;
    BuildStream( os, args... );
}

template<typename... ArgPack>
void LogicError( const ArgPack& ... args )
{
    std::ostringstream os;
    BuildStream( os, args... );
    os << std::endl;
    throw std::logic_error( os.str().c_str() );
}

} // namespace NotEl

int main( int argc, char* argv[] )
{
    try {  NotEl::LogicError("Successfully caught exception!"); }
    catch( std::exception& e ) { std::cout << e.what() << std::endl; }
    return 0;
}

@rhl-
Copy link
Member

rhl- commented Nov 26, 2016

What if it's const reference

@poulson
Copy link
Member Author

poulson commented Nov 26, 2016

Could you be more specific? I am not aware of anything that could be changed to a const reference except for the trivial BuildStream function.

@rhl-
Copy link
Member

rhl- commented Nov 26, 2016 via email

@poulson
Copy link
Member Author

poulson commented Nov 26, 2016

For what it's worth, this issue seems to only appear for certain compilation commands, as a standalone build of the above MRE runs just fine with a script of the form:

#!/bin/bash
/usr/bin/clang++ -O3 -std=c++11 -o test.cpp.o -c test.cpp
/usr/bin/clang++   -O3 -std=c++11  -Wl,-search_paths_first -Wl,-headerpad_max_install_names  -Wl,-flat_namespace  -Wl,-commons,use_dylibs test.cpp.o  -o test

but the MRE fails if the sandbox/test.cpp Elemental driver is replaced with the above code and then executed with ./bin/sandbox-test. So there must be some subtle conflict happening.

@poulson
Copy link
Member Author

poulson commented Nov 26, 2016

After some extensive bisection of linker commands, it appears that the problem is due to CMake automatically adding the linker command -Wl,-flat_namespace. When I manually remove it, exceptions seem to be correctly handled. I will need to investigate this further.

EDIT: After looking through the CMake module files (in /usr/local/share/cmake/) I see that only FindPHP4.cmake specifies the flat_namespace option. The problem turns out to be MPICH's mpicxx wrapper specifying final_ldflags=" -Wl,-flat_namespace -Wl,-commons,use_dylibs".

@poulson
Copy link
Member Author

poulson commented Nov 26, 2016

It seems that this is a known problem: http://lists.mcs.anl.gov/pipermail/petsc-dev/2013-April/011992.html @knepley

@poulson
Copy link
Member Author

poulson commented Nov 26, 2016

It seems that the fix is to configure MPICH with the option --enable-two-level-namespace. Since it is already necessary to manually compile MPICH on OS X (as discussed in Issue #200), this should only be a minor complication to the current situation. I will close the issue after verifying the fix.

@poulson poulson closed this as completed Nov 27, 2016
@poulson
Copy link
Member Author

poulson commented Dec 13, 2016

This issue appears to also effect GCC on OS X Sierra. And unfortunately configuring MPICH with the --enable-two-level-namespace option does not seem to fix the issue.

@jeffhammond
Copy link
Member

Does Open-MPI work? I wonder how coupled to MPICH this is.

@poulson
Copy link
Member Author

poulson commented Dec 13, 2016

Unfortunately (fortunately?) it seems to be independent.

Compiling

#include <stdexcept>
int main( int argc, char* argv[] )
{
    try { throw std::logic_error("LogicError"); }
    catch( std::exception& e ) { }
    return 0;
}

via g++-4.9 exception.cpp -o exception and then running with ./exception produced Abort trap: 6 on OS X Sierra with Homebrew GCC 4.9.

@vikumar-ciena
Copy link

@poulson : Did you find any fix to this abort while throwing exception ? I am facing the same issue on Sierra .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants