Skip to content

Describe __cxa_thread_atexit. #136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
197 changes: 141 additions & 56 deletions abi.html
Original file line number Diff line number Diff line change
Expand Up @@ -4079,22 +4079,21 @@ <h5><a href="#ctor-order-linker"> 3.3.5.5 Linker Processing </a></h5>

<p>
<a name="dso-dtor">
<h4><a href="#dso-dtor"> 3.3.6 DSO Object Destruction API </a></h4>
<h4><a href="#dso-dtor"> 3.3.6 Global and Thread-Local Object Destruction API </a></h4>

<p>
<a name="dso-dtor-motivation">
<h5><a href="#dso-dtor-motivation"> 3.3.6.1 Motivation </a></h5>

<p>
The C++ Standard requires that destructors be called for global objects
when a program exits in the opposite order of construction.
Most implementations have handled this by calling the C library
<code>atexit</code> routine to register the destructors.
This is problematic because the 1999 C Standard only requires that the
implementation support 32 registered functions,
although most implementations support many more.
More important,
it does not deal at all with the ability in most implementations to
The C++ Standard requires that destructors be called for global
objects when a program exits. Most implementations have handled
this by calling the C library <code>atexit</code> routine to
register the destructors.
This is problematic because the C standard only requires that the
implementation support 32 registered functions, although most
implementations support many more. More importantly, it does not
deal at all with the ability in most implementations to
remove DSOs from a running program image by calling
<code>dlclose</code> prior to program termination.

Expand All @@ -4107,73 +4106,156 @@ <h5><a href="#dso-dtor-motivation"> 3.3.6.1 Motivation </a></h5>

<p>
<a name="dso-dtor-runtime-data">
<h5><a href="#dso-dtor-runtime-data"> 3.3.6.2 Runtime Data Structure </a></h5>
<h5><a href="#dso-dtor-runtime-data">3.3.6.2 Runtime Data Structure</a></h5>

<p>
The runtime library shall maintain a list of termination functions
with the following information about each:
The runtime library shall maintain a list of <i>global termination
functions</i> with the following information about each:

<ul>
<li> A function pointer (a pointer to a function descriptor on Itanium).
<li> A void* operand to be passed to the function.
<li> A void* handle for the <i>home DSO</i> of the entry (below).
<li> A <code>void*</code> operand to be passed to the function.
<li> A <code>void*</code> handle for the <i>home DSO</i> of the entry (below).
</ul>

<p>
The representation of this structure is implementation defined.
Entries in this list need not be unique. If the same function
is registered multiple times, it is called multiple times as
described below.

<p>
The runtime library shall also maintain an analogous list of
<i>thread-local termination functions</i> for each thread, with
the same information about each.

<p>
The representation of these lists is implementation defined.
All references are via the API described below.

<a name="dso-dtor-runtime-api"> <!-- legacy anchor -->
<a name="global-dtor">
<h5><a href="#global-dtor">3.3.6.3 Destructors for global objects</a></h5>

<p>
<a name="dso-dtor-runtime-api">
<h5><a href="#dso-dtor-runtime-api"> 3.3.6.3 Runtime API </a></h5>
After constructing an object with static storage duration
that will require destruction on process exit, a global
termination function is registered by calling the following
function:

<code><pre>extern "C" int __cxa_atexit(void (*f)(void *), void *p, void *d);</pre></code>

<p>The third argument, <code>d</code>, must be the value
<code>&__dso_handle</code> for the DSO which defines the object;
see below. The first and second arguments may be chosen at the
implementation's convenience.

<ol type=A>
<p>
<li> Object construction:
The runtime library will call <code>f(p)</code> when the unloading
of DSO <code>d</code> is required, including when the process exits.
The call will occur before the DSO is actually unloaded from the
process. Calls to global termination functions will be made in the
reverse order of their registration.

<p>
After constructing a global (or local static) object,
that will require destruction on exit,
a termination function is <i>registered</i> as follows:
<center><code>
extern "C" int __cxa_atexit ( void (*f)(void *), void *p, void *d );
</code></center>
This registration, e.g. <code>__cxa_atexit(f,p,d)</code>,
is intended to cause the call <code>f(p)</code> when DSO <code>d</code> is unloaded,
before all such termination calls registered before this one.
It returns zero if registration is successful, nonzero on failure.
Note that global termination functions may be registered
concurrently. If the registrations of two separate termination
functions are not well-ordered by the <i>strongly happens before</i>
relation, the order in which they are called is unspecified.

<p>
The registration function is not called from within the constructor.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this sentence (and an analogous sentence from the proposed thread-local section) because, as far as I can tell, they're either wrong or not really something that should be specified in the ABI:

  • This certainly can't be talking about a class constructor.
  • If this is talking about a global constructor function, surely the registration function is called from the global constructor, because I can't imagine what else would call it.
  • If this is talking about a period of execution called "the constructor", well, that's not defined by the standard, and it's unclear what it would mean for the registration call to not be a part of it.
  • The registration call needs to be part of the __cxa_guard region. If you finish the initialization, end the guard, and then do the the registration (knowing that you just did the initialization), you lose the interthread ordering that the standard requires: the registration can race with a registration made by some other thread that has observed that the initialization is complete.

<code>__cxa_atexit</code> returns zero if registration is successful,
nonzero on failure.

<a name="thread-local-dtor">
<h5><a href="#thread-local-dtor">3.3.6.4 Destructors for thread-local objects</a></h5>

<p>
<li> User <code>atexit</code> calls:
After constructing an object with thread storage duration
that will require destruction on process or thread exit,
a thread-local termination function is registered for the
current thread by calling the following function:

<code><pre>extern "C" int __cxa_thread_atexit(void (*f)(void *), void *p, void *d);</pre></code>

<p>The third argument, <code>d</code>, must be the value
<code>&__dso_handle</code> for the DSO which defines the object;
see below. The first and second arguments may be chosen at the
implementation's convenience.

<p>
When the user registers exit functions with <code>atexit</code>,
they should be registered with NULL parameters and DSO handles, i.e.
<center><code>
__cxa_atexit ( f, NULL, NULL );
</code></center>
It is expected that implementations supporting both C and C++ will
integrate this capability into the libc <code>atexit</code>
implementation so that C-only DSOs will nevertheless interact with C++
programs in a C++-standard-conforming manner.
No user interface to <code>__cxa_atexit</code> is supported,
so the user is not able to register an <code>atexit</code> function
with a parameter or a home DSO.
The runtime library will call <code>f(p)</code> if the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires memory allocation, but there is no way to report memory allocation failure. Thread creation is probably a way more common occurrence than global initialization (despite dlopen), so fixing this for __cxa_thread_atexit seems more important than for __cxa_atexit.

One way to fix this is to have an out-of-spec mechanism by which the link editor communicates the maximum number of objects needing such registration to the run-time library. (With ELF, it would probably not involve symbols, but some other mechanis,.)

Copy link
Collaborator Author

@rjmccall rjmccall Jan 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the ABI would be better if the allocation for this wasn't dynamic. A simpler solution than adding a new kind of runtime data collection would be for the compiler to allocate some fixed amount of additional global/thread-local memory for global/thread-local objects that need destruction, the same way that it allocates additional memory for a guard variable. For thread-locals, this memory would be wasted if it was eagerly allocated and the object was never touched on the thread, but so would the memory for the object itself, and overall that seems like an acceptable cost to me.

However, __cxa_thread_atexit is not actually new ABI: it's been implemented for years, and the proposal just never made it into the document, for various not-very-good reasons. So any new proposal here would have to be a v2 recommendation rather than an actual change to __cxa_thread_atexit.

__cxa_thread_atexit does in fact have a return value that it can use to report failure. Neither GCC nor Clang seems to actually check it; I don't know if any compilers do. Frankly, I'm not sure we're really allowed to fail once the initializer has successfully terminated; it would have to be by throwing an exception, and there's no provision for doing so. That's probably the strongest argument for changing the ABI.

registering thread either terminates or exits the process (e.g.
by calling <code>std::exit</code>). Calls will be made in the
reverse order of registration. Calls will be made from the
registering thread.

<p>
<li> Termination:
If the process is exited by a thread, it is unspecified whether
thread-local termination functions registered for different
threads are called. This applies even if another thread has
terminated but has not yet finished calling all of its
termination functions.

<p>
When linking any DSO containing a call to <code>__cxa_atexit</code>,
If the unloading of a DSO is required other than during process
exit (e.g. by calling <code>dlclose</code>), and there are any
any thread-local termination functions associated with that DSO,
the behavior is undefined. Some implementations choose to support
this by delaying the unloading of the DSO until all such functions
have been called on all threads. Implementations that wish to
unload the DSO immediately are encouraged to at least call
thread-local termination functions registered for the current
thread.

<p>
<code>__cxa_thread_atexit</code> returns zero if registration
is successful, nonzero on failure.

<a name="atexit">
<h5><a href="#atexit">3.3.6.5 User calls to <code>atexit</code></a></h5>

<p>
The C++ standard requires calls to exit functions registered with
<code>atexit</code> to occur in reverse order of registration,
appropriately interordered with the destruction of global objects
according to the completion order of their initialization; see
<span class="cxxref">[basic.start.term]</span>. It is expected
that implementations supporting C++ will integrate support for
<code>__cxa_atexit</code> into the C library in order to achieve
these semantics.

<p>
There is no analogous thread-specific <code>atexit</code> API
in the C or C++ standards, or in POSIX threads. If there were,
it would be expected to be appropriately interordered with the
destruction of thread-local objects. It is currently unclear
what the appropriate interordering rules are for the destruction
of thread-local storage such as C11's <code>tss_create</code> or
POSIX's <code>pthread_key_create</code>; these APIs do not
specify any particular destruction order for multiple keys.

<p>
The C++ standard does not provide a user interface corresponding
to <code>__cxa_atexit</code> or <code>__cxa_thread_atexit</code>.
There is no way to register an <code>atexit</code> function
with a parameter or a home DSO.

<a name="termination">
<h5><a href="#termination">3.3.6.6 Termination</a></h5>

<p>
When linking any DSO containing a call to <code>__cxa_atexit</code>
or <code>__cxa_thread_atexit</code>,
the linker should define a hidden symbol <code>__dso_handle</code>,
with a value which is an address in one of the object's segments.
(It does not matter what address,
as long as they are different in different DSOs.)
It should also include a call to the following function in the FINI
(It does not matter what address, as long as they are different in

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the GNU implementation, it must be an address within the object defining that hidden symbol (not an absolute address). I suggest to drop the remark in parentheses, it's confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the definition of __dso_handle is arguably outside of the ABI's scope; I'll think about how to reword this.

different DSOs.)

<p>
Additionally, DSOs that contain a call to <code>__cxa_atexit</code>
+should also include a call to the following function in the FINI
list (to be executed first):
<center><code>
extern "C" void __cxa_finalize ( void *d );
</code></center>
<code><pre>extern "C" void __cxa_finalize ( void *d );</pre></code>
The parameter passed should be <code>&__dso_handle</code>.

<p>
Expand All @@ -4196,8 +4278,10 @@ <h5><a href="#dso-dtor-runtime-api"> 3.3.6.3 Runtime API </a></h5>

<p>
When the main program calls <code>exit</code>,
it must call any remaining <code>__cxa_atexit</code>-registered functions,
either by calling <code>__cxa_finalize(NULL)</code>,
it must first call any <code>__cxa_thread_atexit</code>-registered
functions for the exiting thread.
Next, it must call any remaining <code>__cxa_atexit</code>-registered
functions, either by calling <code>__cxa_finalize(NULL)</code>,
or by walking the registration list itself.

<p>
Expand All @@ -4208,8 +4292,9 @@ <h5><a href="#dso-dtor-runtime-api"> 3.3.6.3 Runtime API </a></h5>
</ol>

<p>
Since <code>__cxa_atexit</code> and <code>__cxa_finalize</code>
must both manipulate the same termination function list,
Since calls to <code>__cxa_atexit</code>,
<code>__cxa_thread_atexit</code>, and <code>__cxa_finalize</code>
must manipulate the same termination function lists,
they must be defined in the implementation's runtime library,
rather than in the individual linked objects.

Expand Down