Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prep release: v2.1.0 #7109

Merged
merged 10 commits into from
Mar 26, 2025
Merged

prep release: v2.1.0 #7109

merged 10 commits into from
Mar 26, 2025

Conversation

abernix
Copy link
Member

@abernix abernix commented Mar 25, 2025

Note

When approved, this PR will merge into the 2.1.0 branch which will — upon being approved itself — merge into dev.

Things to review in this PR:

  • Changelog correctness (There is a preview below, but it is not necessarily the most up to date. See the Files Changed for the true reality.)
  • Version bumps
  • That it targets the right release branch (2.1.0 in this case!).

🚀 Features

Add metric to measure cardinality overflow frequency (PR #6998)

Adds a new counter metric, apollo.router.telemetry.metrics.cardinality_overflow, that is incremented when the cardinality overflow log from opentelemetry-rust occurs. This log means that a metric in a batch has reached a cardinality of > 2000 and that any excess attributes will be ignored.

By @rregitsky in #6998

Introduce PQ manifest hot_reload option for local manifests (PR #6987)

This change introduces a persisted_queries.hot_reload configuration option to allow the router to hot reload local PQ manifest changes.

If you configure local_manifests, you can set hot_reload to true to automatically reload manifest files whenever they change. This lets you update local manifest files without restarting the router.

persisted_queries:
  enabled: true
  local_manifests:
    - ./path/to/persisted-query-manifest.json
  hot_reload: true

Note: This change explicitly does not piggyback on the existing --hot-reload flag.

By @trevor-scheer in #6987

Add metrics for value completion errors (PR #6905)

When the router encounters a value completion error, it is not included in the GraphQL errors array, making it harder to observe. To surface this issue in a more obvious way, router now counts value completion error metrics via the metric instruments apollo.router.graphql.error and apollo.router.operations.error, distinguishable via the code attribute with value RESPONSE_VALIDATION_FAILED.

By @timbotnik in #6905

Changes to experimental error metrics (PR #6966)

In 2.0.0, an experimental metric telemetry.apollo.errors.experimental_otlp_error_metrics was introduced to track errors with additional attributes. A few related changes are included here:

  • Sending these metrics now also respects the subgraph's send flag e.g. telemetry.apollo.errors.subgraph.[all|(subgraph name)].send.
  • A new configuration option telemetry.apollo.errors.subgraph.[all|(subgraph name)].redaction_policy has been added. This flag only applies when redact is set to true. When set to ErrorRedactionPolicy.Strict, error redaction will behave as it has in the past. Setting this to ErrorRedactionPolicy.Extended will allow the extensions.code value from subgraph errors to pass through redaction and be sent to Studio.
  • A warning about incompatibility of error telemetry with connectors will be suppressed when this feature is enabled, since it does support connectors when using the new mode.

By @timbotnik in #6966

Add router config validate subcommand (PR #7016)

Adds new router config validate subcommand to allow validation of a router config file without fully starting up the Router.

./router config validate <path-to-config-file.yaml>

By @andrewmcgivery in #7016

Support traffic shaping for connectors (PR #6737)

Traffic shaping is now supported for connectors. To target a specific source, use the subgraph_name.source_name under the new connector.sources property of traffic_shaping. Settings under connector.all will apply to all connectors. deduplicate_query is not supported at this time.

Example config:

traffic_shaping:
  connector:
    all:
      timeout: 5s
    sources:
      connector-graph.random_person_api:
        global_rate_limit:
          capacity: 20
          interval: 1s
        experimental_http2: http2only
        timeout: 1s

By @andrewmcgivery in #6737

Add apollo.router.pipelines metrics (PR #6967)

When the router reloads, either via schema change or config change, a new request pipeline is created.
Existing request pipelines are closed once their requests finish. However, this may not happen if there are ongoing long requests that do not finish, such as Subscriptions.

To enable debugging when request pipelines are being kept around, a new gauge metric has been added:

  • apollo.router.pipelines - The number of request pipelines active in the router
    • schema.id - The Apollo Studio schema hash associated with the pipeline.
    • launch.id - The Apollo Studio launch id associated with the pipeline (optional).
    • config.hash - The hash of the configuration

By @BrynCooke in #6967

Update JWT handling (PR #6930)

This PR updates JWT-handling in the AuthenticationPlugin;

  • Users may now set a new config option config.authentication.router.jwt.on_error.
    • When set to the default Error, JWT-related errors will be returned to users (the current behavior).
    • When set to Continue, JWT errors will instead be ignored, and JWT claims will not be set in the request context.
  • When JWTs are processed, whether processing succeeds or fails, the request context will contain a new variable apollo::authentication::jwt_status which notes the result of processing.

By @Velfi in #6930

Add support to get/set URI scheme in Rhai (Issue #6897)

This adds support to read and write the scheme from the request.uri.scheme/request.subgraph.uri.scheme functions in Rhai,
enabling the ability to switch between http and https for subgraph fetches. For example:

fn subgraph_service(service, subgraph){
    service.map_request(|request|{
        log_info(`${request.subgraph.uri.scheme}`);
        if request.subgraph.uri.scheme == {} {
            log_info("Scheme is not explicitly set");
        }
        request.subgraph.uri.scheme = "https"
        request.subgraph.uri.host = "api.apollographql.com";
        request.subgraph.uri.path = "/api/graphql";
        request.subgraph.uri.port = 1234;
        log_info(``);
    });
}

By @starJammer in #6906

Add apollo.router.open_connections metric (PR #7023)

To help users to diagnose when connections are keeping pipelines hanging around, the following metric has been added:

  • apollo.router.open_connections - The number of request pipelines active in the router
    • schema.id - The Apollo Studio schema hash associated with the pipeline.
    • launch.id - The Apollo Studio launch id associated with the pipeline (optional).
    • config.hash - The hash of the configuration.
    • server.address - The address that the router is listening on.
    • server.port - The port that the router is listening on if not a unix socket.
    • http.connection.state - Either active or terminating.

You can use this metric to monitor when connections are open via long running requests or keepalive messages.

By @bryncooke in #7023

Add batching.maximum_size configuration option to limit maximum client batch size (PR #7005)

Add an optional maximum_size parameter to the batching configuration.

  • When specified, the router will reject requests which contain more than maximum_size queries in the client batch.
  • When unspecified, the router performs no size checking (the current behavior).

If the number of queries provided exceeds the maximum batch size, the entire batch fails with error code 422 (
Unprocessable Content). For example:

{
  "errors": [
    {
      "message": "Invalid GraphQL request",
      "extensions": {
        "details": "Batch limits exceeded: you provided a batch with 3 entries, but the configured maximum router batch size is 2",
        "code": "BATCH_LIMIT_EXCEEDED"
      }
    }
  ]
}

By @carodewig in #7005

Support TLS configuration for connectors (PR #6995)

Connectors now supports TLS configuration for using custom certificate authorities and utilizing client certificate authentication.

tls:
  connector:
    sources:
      connector-graph.random_person_api:
        certificate_authorities: 
        client_authentication:
          certificate_chain: 
          key: 

By @andrewmcgivery in #6995

Enable remote proxy downloads of the Router

This enables users without direct download access to specify a remote proxy mirror location for the GitHub download of
the Apollo Router releases.

By @LongLiveCHIEF in #6667

Add span events to error spans for connectors and demand control plugin (PR #6727)

New span events have been added to trace spans which include errors. These span events include the GraphQL error code that relates to the error. So far, this only includes errors generated by connectors and the demand control plugin.

By @bonnici in #6727

🐛 Fixes

Export gauge instruments (Issue #6859)

Previously in router 2.x, when using the router's OTel meter_provider() to report metrics from Rust plugins, gauge instruments such as those created using .u64_gauge() weren't exported. The router now exports these instruments.

By @yanns in #6865

Use batch_processor config for Apollo metrics PeriodicReader (PR #7024)

The Apollo OTLP batch_processor configurations telemetry.apollo.batch_processor.scheduled_delay and telemetry.apollo.batch_processor.max_export_timeout now also control the Apollo OTLP PeriodicReader export interval and timeout, respectively. This update brings parity between Apollo OTLP metrics and non-Apollo OTLP exporter metrics.

By @rregitsky in #7024

Reduce Brotli encoding compression level (Issue #6857)

The Brotli encoding compression level has been changed from 11 to 4 to improve performance and mimic other compression algorithms' fast setting. This value is also a much more reasonable value for dynamic workloads.

By @carodewig in #7007

CPU count inference improvements for cgroup environments (PR #6787)

This fixes an issue where the fleet_detector plugin would not correctly infer the CPU limits for a system which used cgroup or cgroup2.

By @nmoutschen in #6787

Separate entity keys and representation variables in entity cache key (Issue #6673)

This fix separates the entity keys and representation variable values in the cache key, to avoid issues with @requires for example.

Important

If you have enabled Distributed query plan caching, this release contains changes which necessarily alter the hashing algorithm used for the cache keys. On account of this, you should anticipate additional cache regeneration cost when updating between these versions while the new hashing algorithm comes into service.

By @bnjjj in #6888

Replace Rhai-specific hot-reload functionality with general hot-reload (PR #6950)

In Router 2.0 the rhai hot-reload capability was not working. This was because of architectural improvements to the router which meant that the entire service stack was no longer re-created for each request.

The fix adds the rhai source files into the primary list of elements, configuration, schema, etc..., watched by the router and removes the old Rhai-specific file watching logic.

If --hot-reload is enabled, the router will reload on changes to Rhai source code just like it would for changes to configuration, for example.

By @garypen in #6950

📃 Configuration

Make experimental OTLP error metrics feature flag non-experimental (PR #7033)

Because the OTLP error metrics feature is being promoted to preview from experimental, this change updates its feature flag name from experimental_otlp_error_metrics to preview_extended_error_metrics.

By @merylc in #7033

Tip

All notable changes to Router v2.x after its initial release will be documented in this file. To see previous history, see the changelog prior to v2.0.0.

@abernix abernix requested review from a team as code owners March 25, 2025 12:28
@svc-apollo-docs
Copy link
Collaborator

svc-apollo-docs commented Mar 25, 2025

⚠️ Docs preview not attached to branch

The preview was not built because the PR's base branch 2.1.0 is not in the list of sources.

An Apollo team member can comment one of the following commands to dictate which branch to attach the preview to:

  • !docs set-base-branch 1.x
  • !docs set-base-branch dev

Build ID: 219c602f47a6957ce69f30d4

@abernix abernix requested a review from a team March 25, 2025 18:14
Copy link
Contributor

@carodewig carodewig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Various nit-picky suggestions to improve the consistency and clarity of the changelog. Please don't hesitate to reject any/all - I may have inadvertently just imposed my own style preferences rather than making meaningful contributions 😅

NB: I do think it might be helpful to reorganize some of the changes - perhaps picking the top few features and then bucketing the rest by rough categories would improve the flow?

Velfi and others added 2 commits March 25, 2025 15:56
Co-authored-by: Caroline Rodewig <[email protected]>
Co-authored-by: Caroline Rodewig <[email protected]>
abernix and others added 4 commits March 26, 2025 11:31
BrynCooke
BrynCooke previously approved these changes Mar 26, 2025
lrlna
lrlna previously approved these changes Mar 26, 2025
garypen
garypen previously approved these changes Mar 26, 2025
@abernix abernix dismissed stale reviews from garypen, lrlna, and BrynCooke via a9b6b34 March 26, 2025 11:10
@abernix abernix merged commit cfd1cce into 2.1.0 Mar 26, 2025
9 of 10 checks passed
@abernix abernix deleted the prep-2.1.0 branch March 26, 2025 11:16
abernix added a commit that referenced this pull request Mar 26, 2025
Fixes the mistakes I made while landing "Ordering" in
the prep PR for 2.1.0: #7109.

Ref: a9b6b34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants