Skip to content

SNOW-2159000 TelemetryThreadPool not scaling beyond 1 thread #2219

Closed
@matthagenbuch

Description

@matthagenbuch

TelemetryThreadPool not scaling beyond 1 thread

Please answer these questions before submitting your issue.
In order to accurately debug the issue this information is required. Thanks!

  1. What version of JDBC driver are you using?

    3.23.1

  2. What operating system and processor architecture are you using?

    macOS 14.5.0 (darwin 24.5.0) - but this affects all platforms

  3. What version of Java are you using?

    temurin 24

  4. What did you do?

    Issue Description:
    The TelemetryThreadPool class uses a ThreadPoolExecutor with an unbounded LinkedBlockingQueue and expects it to scale up to the maximum pool size (10 threads) when there are queued telemetry tasks. However, due to how ThreadPoolExecutor works with unbounded queues, it never creates more than the core pool size threads.

    Current Configuration:

    uploader = new ThreadPoolExecutor(
        0, // core size
        10, // max size - EFFECTIVELY IGNORED
        1, // keep alive time
        TimeUnit.SECONDS,
        new LinkedBlockingQueue<>() // unbounded queue - CAUSES THE ISSUE
    );

    Root Cause:
    According to ThreadPoolExecutor documentation and confirmed by this Stack Overflow answer:

    "Using an unbounded queue (for example a LinkedBlockingQueue without a predefined capacity) will cause new tasks to wait in the queue when all corePoolSize threads are busy. Thus, no more than corePoolSize threads will ever be created. (And the value of the maximumPoolSize therefore doesn't have any effect.)"

    Reproduction:

    • Configure your test client to connect to a Snowflake instance in a region with higher latency (this eases reproduction of the issue, but the issue also impacts lower latency connections)
    • Enable debug logging for net.snowflake.client.jdbc.RestRequest
    • Initialize the driver and execute queries with increasing concurrency, ramping up over at least a few minutes
    • Observe from the debug logs that calls to /telemetry/send are all executed from 1 thread
    • Verify from a heap dump that the LinkedBlockingQueue is not empty

    Note that there is also a race case in the thread pool executor logic for spawning new threads when the pool is empty. If a high volume of new telemetry batch requests are generated concurrently while the pool is empty, the executor will spawn multiple threads randomly.

  5. What did you expect to see?

    Expected Behavior:
    When telemetry tasks are queued and not being processed quickly enough, the thread pool should create additional threads up to the maximum pool size (10) to process the backlog in parallel.

    Actual Behavior:

    • Thread pool rarely scales beyond core pool size due to unbounded queue
    • Telemetry tasks queue up instead of being processed in parallel
    • Inconsistent thread count (1-10) due to race conditions and special handling of corePoolSize=0
    • Query performance degradation during high concurrency load
  6. Can you set logging to DEBUG and collect the logs?

    I can't share our complete debug log but I can extract the relevant logs as described above if you think that would be helpful.

Metadata

Metadata

Labels

bugstatus-fixed_awaiting_releaseThe issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector.status-triage_doneInitial triage done, will be further handled by the driver team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions