Description
TelemetryThreadPool not scaling beyond 1 thread
Please answer these questions before submitting your issue.
In order to accurately debug the issue this information is required. Thanks!
-
What version of JDBC driver are you using?
3.23.1
-
What operating system and processor architecture are you using?
macOS 14.5.0 (darwin 24.5.0) - but this affects all platforms
-
What version of Java are you using?
temurin 24
-
What did you do?
Issue Description:
TheTelemetryThreadPool
class uses aThreadPoolExecutor
with an unboundedLinkedBlockingQueue
and expects it to scale up to the maximum pool size (10 threads) when there are queued telemetry tasks. However, due to howThreadPoolExecutor
works with unbounded queues, it never creates more than the core pool size threads.Current Configuration:
uploader = new ThreadPoolExecutor( 0, // core size 10, // max size - EFFECTIVELY IGNORED 1, // keep alive time TimeUnit.SECONDS, new LinkedBlockingQueue<>() // unbounded queue - CAUSES THE ISSUE );
Root Cause:
According toThreadPoolExecutor
documentation and confirmed by this Stack Overflow answer:"Using an unbounded queue (for example a LinkedBlockingQueue without a predefined capacity) will cause new tasks to wait in the queue when all corePoolSize threads are busy. Thus, no more than corePoolSize threads will ever be created. (And the value of the maximumPoolSize therefore doesn't have any effect.)"
Reproduction:
- Configure your test client to connect to a Snowflake instance in a region with higher latency (this eases reproduction of the issue, but the issue also impacts lower latency connections)
- Enable debug logging for
net.snowflake.client.jdbc.RestRequest
- Initialize the driver and execute queries with increasing concurrency, ramping up over at least a few minutes
- Observe from the debug logs that calls to
/telemetry/send
are all executed from 1 thread - Verify from a heap dump that the LinkedBlockingQueue is not empty
Note that there is also a race case in the thread pool executor logic for spawning new threads when the pool is empty. If a high volume of new telemetry batch requests are generated concurrently while the pool is empty, the executor will spawn multiple threads randomly.
-
What did you expect to see?
Expected Behavior:
When telemetry tasks are queued and not being processed quickly enough, the thread pool should create additional threads up to the maximum pool size (10) to process the backlog in parallel.Actual Behavior:
- Thread pool rarely scales beyond core pool size due to unbounded queue
- Telemetry tasks queue up instead of being processed in parallel
- Inconsistent thread count (1-10) due to race conditions and special handling of corePoolSize=0
- Query performance degradation during high concurrency load
-
Can you set logging to DEBUG and collect the logs?
I can't share our complete debug log but I can extract the relevant logs as described above if you think that would be helpful.