Replies: 2 comments 4 replies
-
If fiber pool is counted in constraint then user can spawn a lot of client fibers and fiber pool can stop working. As a result iproto will not function and some system tasks which are executed in the pool will hang. Fiber pool has it's limit. I guess limiting just client fibers will fix the issue when client has a fork bomb (or code that works like for bomb).
It is a global configuration option. I'd set it with |
Beta Was this translation helpful? Give feedback.
-
Regarding the documentPlease, add context/terminology informationAs one who don't remember all the details at the moment I would like if the document has a context information, where it defines terms we're using later:
I would also mention, when a fiber is picked up from a pool and when it is created from scratch. After this context information it should be easier to understand, where we need an additional limit and why. In particular, it is unclear for me, whether the proposed limit covers only out-of-pool fibers or it also includes fibers from the iproto/tx_user pools? Please, add a user scenarioWe can discuss limiting a fiber pool that serves iproto requests or limiting overall amount of fibers or something different, but we should verify each of the variants against some user scenario: is it enough to solve it? Does it solve similar problems of this kind? I think that we should write the scenario down to make the discussion more structured. I propose to perform a dynamic analysisThe document has two statements:
Assuming that a page is 4KiB, these statements look conflicting. Let's describe things in terms of VMAs and VM pages -- this way it is clear that there are free virtual pages, but the VMA threadhold is reached. These are different things. Also, I guess that the numbers in the document are obtained using a static analysis of the code. Let's perform some dynamic analysis to verify the code observations. I propose to perform some experiments with varying balance between request processing time (using What is going if the limit is reached?I see at least three options (in context of serving an iproto request, which seems to be our user scenario):
I guess that the RFC assumes fail-fast, but it is not written anywhere in an explicit way. YAML configurationLet's also include in the document how the new options are configured in the new YAML configuration based startup flow. Regarding the ideaI have doubts about the idea of the new static limit, which seems similar to It sees 'cannot allocate memory' error without any details and asked for help. It appears that the default So, the whole problem is in the diagnostic? Maybe we should consider something that would give statistics about fibers: system/user, iproto_pool/tx_user_pool/background and count of VMAs in each of the categories (if possible) or overall? I would also mention the kernel option name ( We can also consider exporting the VMAs count (as a percent from all available?) into metrics to let admins setup alerting based on it. What do you think about reconsidering this task as an activity toward better observabilty? Footnotes
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Reviewers
Tickets
core: fibers engine backpressure #10169
In some cases, a single Tarantool could potentially use too much memory, leading to resource exhaustion.
To prevent such scenarios, it is proposed to introduce a limit on the number of user-created fibers.
Context and Terminology
System and User Fibers
Tarantool employs fibers (lightweight cooperative threads) to execute tasks. These fibers can be categorized into two groups:
fiber
Lua API (e.g.fiber.create()
) as well as fibers that handle client requests or user transactions. User fibers can be short-lived or long-lived depending logic, and their count can grow or shrink dynamically based on client behavior.Existing Fiber Pools (iproto and
tx_user_pool
)Tarantool uses fiber pools to recycle fibers for frequent operations, reducing the overhead of constantly creating and destroying fibers. There are two notable fiber pools in the current design:
iproto
) thread utilizes a pool of fibers to handle incoming client requests (which are parsed from the network). Each incoming request message may be assigned a fiber from this pool for execution in the transaction thread (tx thread). Tarantool limits the number of concurrent fibers processing network requests via the configuration parameteriproto.net_msg_max
. By capping concurrent in-flight messages, it indirectly limits the number of fibers used for network requests. This prevents unlimited fiber proliferation from a flood of network requests.tx_user_pool
): In the transaction (tx) thread, there is a similar pool for fibers that execute user transactions or requests forwarded from the iproto thread. This pool reuses fibers for running the Lua or internal transaction routines that constitute user requests. It works in tandem with the iproto fiber pool — when a network request arrives, the iproto thread passes it to the tx thread, which takes a fiber fromtx_user_pool
to execute the request. The size or concurrency of this pool is effectively governed by the sameiproto.net_msg_max
limit, since that limits how many requests can be processed in parallel in the tx thread.Each fiber pool has an upper bound on the number of fibers it will create. If a request arrives and a free fiber is available in the pool, it will reuse it; if no free fiber is available but the pool has not reached its limit, a new fiber will be created and added to the pool. If the pool is at its maximum and all fibers are busy, the incoming request will be queued or blocked until a fiber becomes .
Fibers Created Outside Any Pool
Not all fibers in the system come from these managed pools. Fibers created outside any pool refer to those that are spawned directly via the fiber API or other subsystems without using the iproto/tx fiber pooling. For instance, when a user calls
fiber.create(function() ... end)
in a Lua application, a new fiber is allocated on the fly. These fibers are not drawn from a pre-allocated pool because their usage is entirely application-driven and unpredictable. Similarly, some internal modules or background tasks might create fibers on demand outside of the iproto request flow (for example, a background fiber for periodic tasks in a module).The behavior of fibers created outside a pool is straightforward: each fiber.create() will allocate a new fiber structure and a stack for it (typically via a memory allocation or mmap call to reserve stack space, including a guard page for safety). Tarantool does maintain a registry of all alive fibers in a cord thread, but there is currently no built-in limit on how many fibers can be created outside of the pools. The only practical limits are memory. This means a user could create thousands of fibers via the API, which might eventually exhaust system resources or degrade performance.
To summarize the types of fiber
It’s important to note that system fibers (like the ones for replication or WAL) are typically created at specific events (configuring replication or starting the instance) and are not drawn from a pool either — they are just created as needed, but these are few in number (usually on the order of the number of replicas or subsystems active). They also fall outside the iproto and tx user fiber pools, but since they are finite and critical, we consider them separately from user-created fibers. In the current implementation, any fiber that is not obtained from a pool can be considered either a user fiber or a system fiber depending on who triggered its creation. This RFC’s scope is limiting client-triggered (user) fiber creation, and it will exclude system fibers and fiber pool fibers from its new limiting mechanism.
Goal
Introduce a mechanism to limit the number of user-created fibers in Tarantool in order to improve stability and prevent uncontrolled resource consumption. User applications or clients can inadvertently create thousands of fibers – for example, by starting a new fiber for each task or request without any limit. This can lead to excessive memory usage (each fiber has a stack and control block) and even exhaustion of OS resources (e.g., hitting the Linux
vm.max_map_count
limit on virtual memory areas due to too many stacks). The goal is to provide a safety net that:iproto.net_msg_max
).To summarize the current state:
net_msg_max
).Memory Consumption Analysis
Fibers allocate two pages of memory, simultaneous creation of thousands of fibers can lead to significant memory consumption.
In system, each fiber memory allocation is composed of two major parts:
Fiber Stack:
The default fiber stack size is set using a CMake variable. In our sources, we have the following macro:
This means that, by default, each fiber is allocated 524,288 bytes (which is 512 KB) of stack memory.
Fiber Structure:
Aside from the stack, the fiber itself has a control structure, which occupies approximately 456 bytes.
Note on External Constraints:
Option name and operating principle
It is suggested to name the option (under discussion):
Example of work:
When the number of current user fibers reaches
client_fiber_limit = N
, no additional fibers are created, and the call new create fiber returns an error, for example:This gives a clear signal: the limit has been reached, and no new fibers will be started. The system does not wait for resources to be freed or remove guard pages — it simply refuses to create a new fiber.
Beta Was this translation helpful? Give feedback.
All reactions