SNOW-2159000 TelemetryThreadPool not scaling beyond 1 thread

# TelemetryThreadPool not scaling beyond 1 thread

Please answer these questions before submitting your issue. 
In order to accurately debug the issue this information is required. Thanks!

1. What version of JDBC driver are you using?

   `3.23.1`
   
2. What operating system and processor architecture are you using?

   macOS 14.5.0 (darwin 24.5.0) - but this affects all platforms
   
3. What version of Java are you using?

   temurin 24
   
4. What did you do?

   **Issue Description:**
   The `TelemetryThreadPool` class uses a `ThreadPoolExecutor` with an unbounded `LinkedBlockingQueue` and expects it to scale up to the maximum pool size (10 threads) when there are queued telemetry tasks. However, due to how `ThreadPoolExecutor` works with unbounded queues, it never creates more than the core pool size threads.

   **Current Configuration:**
   ```java
   uploader = new ThreadPoolExecutor(
       0, // core size
       10, // max size - EFFECTIVELY IGNORED
       1, // keep alive time
       TimeUnit.SECONDS,
       new LinkedBlockingQueue<>() // unbounded queue - CAUSES THE ISSUE
   );
   ```

   **Root Cause:**
   According to `ThreadPoolExecutor` documentation and confirmed by [this Stack Overflow answer](https://stackoverflow.com/questions/15485840/threadpoolexecutor-with-unbounded-queue-not-creating-new-threads):
   
   > "Using an unbounded queue (for example a LinkedBlockingQueue without a predefined capacity) will cause new tasks to wait in the queue when all corePoolSize threads are busy. Thus, no more than corePoolSize threads will ever be created. (And the value of the maximumPoolSize therefore doesn't have any effect.)"

   **Reproduction:**
   - Configure your test client to connect to a Snowflake instance in a region with higher latency (this eases reproduction of the issue, but the issue also impacts lower latency connections)
   - Enable debug logging for `net.snowflake.client.jdbc.RestRequest`
   - Initialize the driver and execute queries with increasing concurrency, ramping up over at least a few minutes
   - Observe from the debug logs that calls to `/telemetry/send` are all executed from 1 thread
   - Verify from a heap dump that the LinkedBlockingQueue is not empty

    Note that there is also a race case in the thread pool executor logic for spawning new threads when the pool is empty. If a high volume of new telemetry batch requests are generated concurrently while the pool is empty, the executor will spawn multiple threads randomly.

5. What did you expect to see?

   **Expected Behavior:**
   When telemetry tasks are queued and not being processed quickly enough, the thread pool should create additional threads up to the maximum pool size (10) to process the backlog in parallel.

   **Actual Behavior:**
   - Thread pool rarely scales beyond core pool size due to unbounded queue
   - Telemetry tasks queue up instead of being processed in parallel
   - Inconsistent thread count (1-10) due to race conditions and special handling of corePoolSize=0
   - Query performance degradation during high concurrency load

6. Can you set logging to DEBUG and collect the logs?

   I can't share our complete debug log but I can extract the relevant logs as described above if you think that would be helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SNOW-2159000 TelemetryThreadPool not scaling beyond 1 thread #2219

TelemetryThreadPool not scaling beyond 1 thread

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SNOW-2159000 TelemetryThreadPool not scaling beyond 1 thread #2219

Description

TelemetryThreadPool not scaling beyond 1 thread

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions