Skip to content

Releases: cortexproject/cortex

Cortex 1.13.0-rc.1

06 Jul 23:16
78865aa
Compare
Choose a tag to compare
Cortex 1.13.0-rc.1 Pre-release
Pre-release

Some bug fixes and improvements over 1.13.0-rc.0.

  • Removing fingerprint calculator from the lock on the GetSeries API to improve latency - #4765
  • Ensure that compaction with shuffle sharding continues for block with incomplete time-range - #4771
  • fix cortex_compactor_remaining_planned_compactions not set after plan - #4772

Cortex 1.13.0-rc.0

23 Jun 23:13
Compare
Choose a tag to compare
Cortex 1.13.0-rc.0 Pre-release
Pre-release

This release contains 112 contributions from 51 contributors. Thank you!

Some notable new features in this release are:

  • Streaming capabilities in Querier for metadata APIs.
  • Experimental shuffle sharding support for compactor, which enables parallel compaction.

Some notable enhancement and bug fixes in this release are:

  • New block storage configurations for Azure that allows reduction in memory usage.
  • Memory leak fix in Distributor and Ruler.
  • Jitter in Memberlist rejoin interval that reduces CPU utilization during rejoin.

Cortex

  • [CHANGE] Changed default for -ingester.min-ready-duration from 1 minute to 15 seconds. #4539
  • [CHANGE] query-frontend: Do not print anything in the logs of query-frontend if a in-progress query has been canceled (context canceled) to avoid spam. #4562
  • [CHANGE] Compactor block deletion mark migration, needed when upgrading from v1.7, is now disabled by default. #4597
  • [CHANGE] The status_code label on gRPC client metrics has changed from '200' and '500' to '2xx', '5xx', '4xx', 'cancel' or 'error'. #4601
  • [CHANGE] Memberlist: changed probe interval from 1s to 5s and probe timeout from 500ms to 2s. #4601
  • [CHANGE] Fix incorrectly named cortex_cache_fetched_keys and cortex_cache_hits metrics. Renamed to cortex_cache_fetched_keys_total and cortex_cache_hits_total respectively. #4686
  • [CHANGE] Enable Thanos series limiter in store-gateway. #4702
  • [CHANGE] Distributor: Apply max_fetched_series_per_query limit for /series API. #4683
  • [CHANGE] Re-enable the proxy_url option for receiver configuration. #4741
  • [FEATURE] Ruler: Add external_labels option to tag all alerts with a given set of labels. #4499
  • [FEATURE] Compactor: Add -compactor.skip-blocks-with-out-of-order-chunks-enabled configuration to mark blocks containing index with out-of-order chunks for no compact instead of halting the compaction. #4707
  • [FEATURE] Querier/Query-Frontend: Add -querier.per-step-stats-enabled and -frontend.cache-queryable-samples-stats configurations to enable query sample statistics. #4708
  • [FEATURE] Add shuffle sharding for the compactor #4433
  • [FEATURE] Querier: Use streaming for ingester metdata APIs. #4725
  • [ENHANCEMENT] Update Go version to 1.17.8. #4602 #4604 #4658
  • [ENHANCEMENT] Keep track of discarded samples due to bad relabel configuration in cortex_discarded_samples_total. #4503
  • [ENHANCEMENT] Ruler: Add -ruler.disable-rule-group-label to disable the rule_group label on exported metrics. #4571
  • [ENHANCEMENT] Query federation: improve performance in MergeQueryable by memoizing labels. #4502
  • [ENHANCEMENT] Added new ring related config -ingester.readiness-check-ring-health when enabled the readiness probe will succeed only after all instances are ACTIVE and healthy in the ring, this is enabled by default. #4539
  • [ENHANCEMENT] Added new ring related config -distributor.excluded-zones when set this will exclude the comma-separated zones from the ring, default is "". #4539
  • [ENHANCEMENT] Upgraded Docker base images to alpine:3.14. #4514
  • [ENHANCEMENT] Updated Prometheus to latest. Includes changes from prometheus#9239, adding 15 new functions. Multiple TSDB bugfixes prometheus#9438 & prometheus#9381. #4524
  • [ENHANCEMENT] Query Frontend: Add setting -frontend.forward-headers-list in frontend to configure the set of headers from the requests to be forwarded to downstream requests. #4486
  • [ENHANCEMENT] Blocks storage: Add -blocks-storage.azure.http.*, -alertmanager-storage.azure.http.*, and -ruler-storage.azure.http.* to configure the Azure storage client. #4581
  • [ENHANCEMENT] Optimise memberlist receive path when used as a backing store for rings with a large number of members. #4601
  • [ENHANCEMENT] Add length and limit to labelNameTooLongError and labelValueTooLongError #4595
  • [ENHANCEMENT] Add jitter to rejoinInterval. #4747
  • [ENHANCEMENT] Compactor: uploading blocks no compaction marks to the global location and introduce a new metric #4729
    • cortex_bucket_blocks_marked_for_no_compaction_count: Total number of blocks marked for no compaction in the bucket.
  • [ENHANCEMENT] Querier: Reduce the number of series that are kept in memory while streaming from ingesters. #4745
  • [BUGFIX] AlertManager: remove stale template files. #4495
  • [BUGFIX] Distributor: fix bug in query-exemplar where some results would get dropped. #4583
  • [BUGFIX] Update Thanos dependency: compactor tracing support, azure blocks storage memory fix. #4585
  • [BUGFIX] Set appropriate Content-Type header for /services endpoint, which previously hard-coded text/plain. #4596
  • [BUGFIX] Querier: Disable query scheduler SRV DNS lookup, which removes noisy log messages about "failed DNS SRV record lookup". #4601
  • [BUGFIX] Memberlist: fixed corrupted packets when sending compound messages with more than 255 messages or messages bigger than 64KB. #4601
  • [BUGFIX] Query Frontend: If 'LogQueriesLongerThan' is set to < 0, log all queries as described in the docs. #4633
  • [BUGFIX] Distributor: update defaultReplicationStrategy to not fail with extend-write when a single instance is unhealthy. #4636
  • [BUGFIX] Distributor: Fix race condition on /series introduced by #4683. #4716
  • [BUGFIX] Ruler: Fixed leaking notifiers after users are removed #4718
  • [BUGFIX] Distributor: Fix a memory leak in distributor due to the cluster label. #4739
  • [BUGFIX] Memberlist: Avoid clock skew by limiting the timestamp accepted on gossip. #4750
  • [BUGFIX] Compactor: skip compaction if there is only 1 block available for shuffle-sharding compactor. #4756

Cortex 1.11.1

11 Mar 09:12
v1.11.1
d7188d3
Compare
Choose a tag to compare

This is a security release to include the fix for CVE-2022-24921 "stack exhaustion via a deeply nested expression".

The fix was to rebuild with Go v1.16.15, at #4663.

Cortex 1.11.0

25 Nov 17:23
43c646b
Compare
Choose a tag to compare

This release contains 76 contributions from 31 authors. Thank you!

A broad range of improvements, including support for cloud services such as Memcached auto-discovery and Amazon SNS.

Cortex

  • [CHANGE] Memberlist: Expose default configuration values to the command line options. Note that setting these explicitly to zero will no longer cause the default to be used. If the default is desired, then do set the option. The following are affected: #4276
    • -memberlist.stream-timeout
    • -memberlist.retransmit-factor
    • -memberlist.pull-push-interval
    • -memberlist.gossip-interval
    • -memberlist.gossip-nodes
    • -memberlist.gossip-to-dead-nodes-time
    • -memberlist.dead-node-reclaim-time
  • [CHANGE] -querier.max-fetched-chunks-per-query previously applied to chunks from ingesters and store separately; now the two combined should not exceed the limit. #4260
  • [CHANGE] Memberlist: the metric memberlist_kv_store_value_bytes has been removed due to values no longer being stored in-memory as encoded bytes. #4345
  • [CHANGE] Some files and directories created by Cortex components on local disk now have stricter permissions, and are only readable by owner, but not group or others. #4394
  • [CHANGE] The metric cortex_deprecated_flags_inuse_total has been renamed to deprecated_flags_inuse_total as part of using grafana/dskit functionality. #4443
  • [FEATURE] Ruler: Add new -ruler.query-stats-enabled which when enabled will report the cortex_ruler_query_seconds_total as a per-user metric that tracks the sum of the wall time of executing queries in the ruler in seconds. #4317
  • [FEATURE] Query Frontend: Add cortex_query_fetched_series_total and cortex_query_fetched_chunks_bytes_total per-user counters to expose the number of series and bytes fetched as part of queries. These metrics can be enabled with the -frontend.query-stats-enabled flag (or its respective YAML config option query_stats_enabled). #4343
  • [FEATURE] AlertManager: Add support for SNS Receiver. #4382
  • [FEATURE] Distributor: Add label status to metric cortex_distributor_ingester_append_failures_total #4442
  • [FEATURE] Queries: Added present_over_time PromQL function, also some TSDB optimisations. #4505
  • [ENHANCEMENT] Add timeout for waiting on compactor to become ACTIVE in the ring. #4262
  • [ENHANCEMENT] Reduce memory used by streaming queries, particularly in ruler. #4341
  • [ENHANCEMENT] Ring: allow experimental configuration of disabling of heartbeat timeouts by setting the relevant configuration value to zero. Applies to the following: #4342
    • -distributor.ring.heartbeat-timeout
    • -ring.heartbeat-timeout
    • -ruler.ring.heartbeat-timeout
    • -alertmanager.sharding-ring.heartbeat-timeout
    • -compactor.ring.heartbeat-timeout
    • -store-gateway.sharding-ring.heartbeat-timeout
  • [ENHANCEMENT] Ring: allow heartbeats to be explicitly disabled by setting the interval to zero. This is considered experimental. This applies to the following configuration options: #4344
    • -distributor.ring.heartbeat-period
    • -ingester.heartbeat-period
    • -ruler.ring.heartbeat-period
    • -alertmanager.sharding-ring.heartbeat-period
    • -compactor.ring.heartbeat-period
    • -store-gateway.sharding-ring.heartbeat-period
  • [ENHANCEMENT] Memberlist: optimized receive path for processing ring state updates, to help reduce CPU utilization in large clusters. #4345
  • [ENHANCEMENT] Memberlist: expose configuration of memberlist packet compression via -memberlist.compression=enabled. #4346
  • [ENHANCEMENT] Update Go version to 1.16.6. #4362
  • [ENHANCEMENT] Updated Prometheus to include changes from prometheus/prometheus#9083. Now whenever /labels API calls include matchers, blocks store is queried for LabelNames with matchers instead of Series calls which was inefficient. #4380
  • [ENHANCEMENT] Querier: performance improvements in socket and memory handling. #4429 #4377
  • [ENHANCEMENT] Exemplars are now emitted for all gRPC calls and many operations tracked by histograms. #4462
  • [ENHANCEMENT] New options -server.http-listen-network and -server.grpc-listen-network allow binding as 'tcp4' or 'tcp6'. #4462
  • [ENHANCEMENT] Rulers: Using shuffle sharding subring on GetRules API. #4466
  • [ENHANCEMENT] Support memcached auto-discovery via auto-discovery flag, introduced by thanos in thanos-io/thanos#4487. Both AWS and Google Cloud memcached service support auto-discovery, which returns a list of nodes of the memcached cluster. #4412
  • [BUGFIX] Fixes a panic in the query-tee when comparing result. #4465
  • [BUGFIX] Frontend: Fixes @ modifier functions (start/end) when splitting queries by time. #4464
  • [BUGFIX] Compactor: compactor will no longer try to compact blocks that are already marked for deletion. Previously compactor would consider blocks marked for deletion within -compactor.deletion-delay / 2 period as eligible for compaction. #4328
  • [BUGFIX] HA Tracker: when cleaning up obsolete elected replicas from KV store, tracker didn't update number of cluster per user correctly. #4336
  • [BUGFIX] Ruler: fixed counting of PromQL evaluation errors as user-errors when updating cortex_ruler_queries_failed_total. #4335
  • [BUGFIX] Ingester: When using block storage, prevent any reads or writes while the ingester is stopping. This will prevent accessing TSDB blocks once they have been already closed. #4304
  • [BUGFIX] Ingester: fixed ingester stuck on start up (LEAVING ring state) when -ingester.heartbeat-period=0 and -ingester.unregister-on-shutdown=false. #4366
  • [BUGFIX] Ingester: panic during shutdown while fetching batches from cache. #4397
  • [BUGFIX] Querier: After query-frontend restart, querier may have lower than configured concurrency. #4417
  • [BUGFIX] Memberlist: forward only changes, not entire original message. #4419
  • [BUGFIX] Memberlist: don't accept old tombstones as incoming change, and don't forward such messages to other gossip members. #4420
  • [BUGFIX] Querier: fixed panic when querying exemplars and using -distributor.shard-by-all-labels=false. #4473
  • [BUGFIX] Querier: honor querier minT,maxT if nil SelectHints are passed to Select(). #4413
  • [BUGFIX] Compactor: fixed panic while collecting Prometheus metrics. #4483
  • [BUGFIX] Update go-kit package to fix spurious log messages #4544

Cortex 1.11.0-rc.1

04 Nov 14:35
2e5ac2a
Compare
Choose a tag to compare
Cortex 1.11.0-rc.1 Pre-release
Pre-release

Over v1.11.0-rc.0, this fixes a problem whereby some debug logs would be output when they were supposed to be filtered out.

This update required a couple more depedencies to be updated, but should not have any visible change.
See #4544

Cortex 1.11.0-rc.0

29 Oct 08:36
919d028
Compare
Choose a tag to compare
Cortex 1.11.0-rc.0 Pre-release
Pre-release

This release contains 76 contributions from 31 authors. Thank you!

A broad range of improvements, including support for cloud services such as Memcached auto-discovery and Amazon SNS.

Cortex

  • [CHANGE] Memberlist: Expose default configuration values to the command line options. Note that setting these explicitly to zero will no longer cause the default to be used. If the default is desired, then do set the option. The following are affected: #4276
    • -memberlist.stream-timeout
    • -memberlist.retransmit-factor
    • -memberlist.pull-push-interval
    • -memberlist.gossip-interval
    • -memberlist.gossip-nodes
    • -memberlist.gossip-to-dead-nodes-time
    • -memberlist.dead-node-reclaim-time
  • [CHANGE] -querier.max-fetched-chunks-per-query previously applied to chunks from ingesters and store separately; now the two combined should not exceed the limit. #4260
  • [CHANGE] Memberlist: the metric memberlist_kv_store_value_bytes has been removed due to values no longer being stored in-memory as encoded bytes. #4345
  • [CHANGE] Some files and directories created by Cortex components on local disk now have stricter permissions, and are only readable by owner, but not group or others. #4394
  • [CHANGE] The metric cortex_deprecated_flags_inuse_total has been renamed to deprecated_flags_inuse_total as part of using grafana/dskit functionality. #4443
  • [FEATURE] Ruler: Add new -ruler.query-stats-enabled which when enabled will report the cortex_ruler_query_seconds_total as a per-user metric that tracks the sum of the wall time of executing queries in the ruler in seconds. #4317
  • [FEATURE] Query Frontend: Add cortex_query_fetched_series_total and cortex_query_fetched_chunks_bytes_total per-user counters to expose the number of series and bytes fetched as part of queries. These metrics can be enabled with the -frontend.query-stats-enabled flag (or its respective YAML config option query_stats_enabled). #4343
  • [FEATURE] AlertManager: Add support for SNS Receiver. #4382
  • [FEATURE] Distributor: Add label status to metric cortex_distributor_ingester_append_failures_total #4442
  • [FEATURE] Queries: Added present_over_time PromQL function, also some TSDB optimisations. #4505
  • [ENHANCEMENT] Add timeout for waiting on compactor to become ACTIVE in the ring. #4262
  • [ENHANCEMENT] Reduce memory used by streaming queries, particularly in ruler. #4341
  • [ENHANCEMENT] Ring: allow experimental configuration of disabling of heartbeat timeouts by setting the relevant configuration value to zero. Applies to the following: #4342
    • -distributor.ring.heartbeat-timeout
    • -ring.heartbeat-timeout
    • -ruler.ring.heartbeat-timeout
    • -alertmanager.sharding-ring.heartbeat-timeout
    • -compactor.ring.heartbeat-timeout
    • -store-gateway.sharding-ring.heartbeat-timeout
  • [ENHANCEMENT] Ring: allow heartbeats to be explicitly disabled by setting the interval to zero. This is considered experimental. This applies to the following configuration options: #4344
    • -distributor.ring.heartbeat-period
    • -ingester.heartbeat-period
    • -ruler.ring.heartbeat-period
    • -alertmanager.sharding-ring.heartbeat-period
    • -compactor.ring.heartbeat-period
    • -store-gateway.sharding-ring.heartbeat-period
  • [ENHANCEMENT] Memberlist: optimized receive path for processing ring state updates, to help reduce CPU utilization in large clusters. #4345
  • [ENHANCEMENT] Memberlist: expose configuration of memberlist packet compression via -memberlist.compression=enabled. #4346
  • [ENHANCEMENT] Update Go version to 1.16.6. #4362
  • [ENHANCEMENT] Updated Prometheus to include changes from prometheus/prometheus#9083. Now whenever /labels API calls include matchers, blocks store is queried for LabelNames with matchers instead of Series calls which was inefficient. #4380
  • [ENHANCEMENT] Exemplars are now emitted for all gRPC calls and many operations tracked by histograms. #4462
  • [ENHANCEMENT] New options -server.http-listen-network and -server.grpc-listen-network allow binding as 'tcp4' or 'tcp6'. #4462
  • [ENHANCEMENT] Rulers: Using shuffle sharding subring on GetRules API. #4466
  • [ENHANCEMENT] Support memcached auto-discovery via auto-discovery flag, introduced by thanos in thanos-io/thanos#4487. Both AWS and Google Cloud memcached service support auto-discovery, which returns a list of nodes of the memcached cluster. #4412
  • [BUGFIX] Fixes a panic in the query-tee when comparing result. #4465
  • [BUGFIX] Frontend: Fixes @ modifier functions (start/end) when splitting queries by time. #4464
  • [BUGFIX] Compactor: compactor will no longer try to compact blocks that are already marked for deletion. Previously compactor would consider blocks marked for deletion within -compactor.deletion-delay / 2 period as eligible for compaction. #4328
  • [BUGFIX] HA Tracker: when cleaning up obsolete elected replicas from KV store, tracker didn't update number of cluster per user correctly. #4336
  • [BUGFIX] Ruler: fixed counting of PromQL evaluation errors as user-errors when updating cortex_ruler_queries_failed_total. #4335
  • [BUGFIX] Ingester: When using block storage, prevent any reads or writes while the ingester is stopping. This will prevent accessing TSDB blocks once they have been already closed. #4304
  • [BUGFIX] Ingester: fixed ingester stuck on start up (LEAVING ring state) when -ingester.heartbeat-period=0 and -ingester.unregister-on-shutdown=false. #4366
  • [BUGFIX] Ingester: panic during shutdown while fetching batches from cache. #4397
  • [BUGFIX] Querier: After query-frontend restart, querier may have lower than configured concurrency. #4417
  • [BUGFIX] Memberlist: forward only changes, not entire original message. #4419
  • [BUGFIX] Memberlist: don't accept old tombstones as incoming change, and don't forward such messages to other gossip members. #4420
  • [BUGFIX] Querier: fixed panic when querying exemplars and using -distributor.shard-by-all-labels=false. #4473
  • [BUGFIX] Querier: honor querier minT,maxT if nil SelectHints are passed to Select(). #4413
  • [BUGFIX] Compactor: fixed panic while collecting Prometheus metrics. #4483

Cortex 1.10.0

05 Aug 10:52
3b9f1c3
Compare
Choose a tag to compare

This release contains 108 contributions from 37 authors. Thank you!

Highlights

  • Chunks storage has been deprecated and is now in maintenance mode.
  • Exemplars now supported - in-memory only.
  • Added many new limits, to help protect your installation against overload.
  • The sharding feature in Alertmanager is now considered complete.
  • Release now has ARM binaries and packages (but not container images, yet).

Cortex

  • [CHANGE] Prevent path traversal attack from users able to control the HTTP header X-Scope-OrgID. #4375 (CVE-2021-36157)
    • Users only have control of the HTTP header when Cortex is not frontended by an auth proxy validating the tenant IDs
  • [CHANGE] Enable strict JSON unmarshal for pkg/util/validation.Limits struct. The custom UnmarshalJSON() will now fail if the input has unknown fields. #4298
  • [CHANGE] Cortex chunks storage has been deprecated and it's now in maintenance mode: all Cortex users are encouraged to migrate to the blocks storage. No new features will be added to the chunks storage. The default Cortex configuration still runs the chunks engine; please check out the blocks storage doc on how to configure Cortex to run with the blocks storage. #4268
  • [CHANGE] The example Kubernetes manifests (stored at k8s/) have been removed due to a lack of proper support and maintenance. #4268
  • [CHANGE] Querier / ruler: deprecated -store.query-chunk-limit CLI flag (and its respective YAML config option max_chunks_per_query) in favour of -querier.max-fetched-chunks-per-query (and its respective YAML config option max_fetched_chunks_per_query). The new limit specifies the maximum number of chunks that can be fetched in a single query from ingesters and long-term storage: the total number of actual fetched chunks could be 2x the limit, being independently applied when querying ingesters and long-term storage. #4125
  • [CHANGE] Alertmanager: allowed to configure the experimental receivers firewall on a per-tenant basis. The following CLI flags (and their respective YAML config options) have been changed and moved to the limits config section: #4143
    • -alertmanager.receivers-firewall.block.cidr-networks renamed to -alertmanager.receivers-firewall-block-cidr-networks
    • -alertmanager.receivers-firewall.block.private-addresses renamed to -alertmanager.receivers-firewall-block-private-addresses
  • [CHANGE] Change default value of -server.grpc.keepalive.min-time-between-pings from 5m to 10s and -server.grpc.keepalive.ping-without-stream-allowed to true. #4168
  • [CHANGE] Ingester: Change default value of -ingester.active-series-metrics-enabled to true. This incurs a small increase in memory usage, between 1.2% and 1.6% as measured on ingesters with 1.3M active series. #4257
  • [CHANGE] Dependency: update go-redis from v8.2.3 to v8.9.0. #4236
  • [FEATURE] Querier: Added new -querier.max-fetched-series-per-query flag. When Cortex is running with blocks storage, the max series per query limit is enforced in the querier and applies to unique series received from ingesters and store-gateway (long-term storage). #4179
  • [FEATURE] Querier/Ruler: Added new -querier.max-fetched-chunk-bytes-per-query flag. When Cortex is running with blocks storage, the max chunk bytes limit is enforced in the querier and ruler and limits the size of all aggregated chunks returned from ingesters and storage as bytes for a query. #4216
  • [FEATURE] Alertmanager: support negative matchers, time-based muting - upstream release notes. #4237
  • [FEATURE] Alertmanager: Added rate-limits to notifiers. Rate limits used by all integrations can be configured using -alertmanager.notification-rate-limit, while per-integration rate limits can be specified via -alertmanager.notification-rate-limit-per-integration parameter. Both shared and per-integration limits can be overwritten using overrides mechanism. These limits are applied on individual (per-tenant) alertmanagers. Rate-limited notifications are failed notifications. It is possible to monitor rate-limited notifications via new cortex_alertmanager_notification_rate_limited_total metric. #4135 #4163
  • [FEATURE] Alertmanager: Added -alertmanager.max-config-size-bytes limit to control size of configuration files that Cortex users can upload to Alertmanager via API. This limit is configurable per-tenant. #4201
  • [FEATURE] Alertmanager: Added -alertmanager.max-templates-count and -alertmanager.max-template-size-bytes options to control number and size of templates uploaded to Alertmanager via API. These limits are configurable per-tenant. #4223
  • [FEATURE] Added flag -debug.block-profile-rate to enable goroutine blocking events profiling. #4217
  • [FEATURE] Alertmanager: The experimental sharding feature is now considered complete. Detailed information about the configuration options can be found here for alertmanager and here for the alertmanager storage. To use the feature: #3925 #4020 #4021 #4031 #4084 #4110 #4126 #4127 #4141 #4146 #4161 #4162 #4222
    • Ensure that a remote storage backend is configured for Alertmanager to store state using -alertmanager-storage.backend, and flags related to the backend. Note that the local and configdb storage backends are not supported.
    • Ensure that a ring store is configured using -alertmanager.sharding-ring.store, and set the flags relevant to the chosen store type.
    • Enable the feature using -alertmanager.sharding-enabled.
    • Note the prior addition of a new configuration option -alertmanager.persist-interval. This sets the interval between persisting the current alertmanager state (notification log and silences) to object storage. See the configuration file reference for more information.
  • [ENHANCEMENT] Alertmanager: Cleanup persisted state objects from remote storage when a tenant configuration is deleted. #4167
  • [ENHANCEMENT] Storage: Added the ability to disable Open Census within GCS client (e.g -gcs.enable-opencensus=false). #4219
  • [ENHANCEMENT] Etcd: Added username and password to etcd config. #4205
  • [ENHANCEMENT] Alertmanager: introduced new metrics to monitor operation when using -alertmanager.sharding-enabled: #4149
    • cortex_alertmanager_state_fetch_replica_state_total
    • cortex_alertmanager_state_fetch_replica_state_failed_total
    • cortex_alertmanager_state_initial_sync_total
    • cortex_alertmanager_state_initial_sync_completed_total
    • cortex_alertmanager_state_initial_sync_duration_seconds
    • cortex_alertmanager_state_persist_total
    • cortex_alertmanager_state_persist_failed_total
  • [ENHANCEMENT] Blocks storage: support ingesting exemplars and querying of exemplars. Enabled by setting new CLI flag -blocks-storage.tsdb.max-exemplars=<n> or config option blocks_storage.tsdb.max_exemplars to positive value. #4124 #4181
  • [ENHANCEMENT] Distributor: Added distributors ring status section in the admin page. #4151
  • [ENHANCEMENT] Added zone-awareness support to alertmanager for use when sharding is enabled. When zone-awareness is enabled, alerts will be replicated across availability zones. #4204
  • [ENHANCEMENT] Added tenant_ids tag to tracing spans #4186
  • [ENHANCEMENT] Ring, query-frontend: Avoid using automatic private IPs (APIPA) when discovering IP address from the interface during the registration of the instance in the ring, or by query-frontend when used with query-scheduler. APIPA still used as last resort with logging indicating usage. #4032
  • [ENHANCEMENT] Memberlist: introduced new metrics to aid troubleshooting tombstone convergence: #4231
    • memberlist_client_kv_store_value_tombstones
    • memberlist_client_kv_store_value_tombstones_removed_total
    • memberlist_client_messages_to_broadcast_dropped_total
  • [ENHANCEMENT] Alertmanager: Added -alertmanager.max-dispatcher-aggregation-groups option to control max number of active dispatcher groups in Alertmanager (per tenant, also overrideable). When the limit is reached, Dispatcher produces log message and increases cortex_alertmanager_dispatcher_aggregation_group_limit_reached_total metric. #4254
  • [ENHANCEMENT] Alertmanager: Added -alertmanager.max-alerts-count and -alertmanager.max-alerts-size-bytes to control max number of alerts and total size of alerts that a single user can have in Alertmanager's memory. Adding more alerts will fail with a log message and incrementing cortex_alertmanager_alerts_insert_limited_total metric (per-user). These limits can be overrided by using per-tenant overrides. Current values are tracked in cortex_alertmanager_alerts_limiter_current_alerts and cortex_alertmanager_alerts_limiter_current_alerts_size_bytes metrics. #4253
  • [ENHANCEMENT] Store-gateway: added -store-gateway.sharding-ring.wait-stability-min-duration and -store-gateway.sharding-ring.wait-stability-max-duration support to store-gateway, to wait for ring stability at startup. #4271
  • [ENHANCEMENT] Ruler: added rule_group label to metrics cortex_prometheus_rule_group_iterations_total and cortex_prometheus_rule_group_iterations_missed_total. #4121
  • [ENHANCEMENT] Ruler: added new metrics for tracking total number of queries and push requests sent to ingester, as well as failed queries and push requests. Failures are only counted for internal errors, but not user-errors like limits or invalid query. This is in contrast to existing cortex_prometheus_rule_evaluation_failures_total, which is incremented also when query or samples appending fails due to user-errors. #4281
    • `cortex_ruler_write_reques...
Read more

Cortex 1.10.0-rc.1

21 Jul 15:15
Compare
Choose a tag to compare
Cortex 1.10.0-rc.1 Pre-release
Pre-release

This is exactly the same as 1.10.0-rc.0, with the addition of a fix to CVE-2021-36157 - #4375

Cortex 1.10.0-rc.0

21 Jul 14:28
Compare
Choose a tag to compare
Cortex 1.10.0-rc.0 Pre-release
Pre-release

This was a release candidate for 1.10.0.

1.9.0 / 2021-05-14

18 May 10:51
v1.9.0
ed4f339
Compare
Choose a tag to compare

This release contains 131 contributions from 28 authors. Thank you!

Highlights

  • We have several exciting features become stable: Shuffle-sharding, querying chunks and blocks store simultaneously, lazy mmap-ing of block indexes, etc.
  • Several query and ingest performance improvements!
  • Tons of bugfixes and optimisations!

Changelog

  • [CHANGE] Alertmanager now removes local files after Alertmanager is no longer running for removed or resharded user. #3910
  • [CHANGE] Alertmanager now stores local files in per-tenant folders. Files stored by Alertmanager previously are migrated to new hierarchy. Support for this migration will be removed in Cortex 1.11. #3910
  • [CHANGE] Ruler: deprecated -ruler.storage.* CLI flags (and their respective YAML config options) in favour of -ruler-storage.*. The deprecated config will be removed in Cortex 1.11. #3945
  • [CHANGE] Alertmanager: deprecated -alertmanager.storage.* CLI flags (and their respective YAML config options) in favour of -alertmanager-storage.*. This change doesn't apply to -alertmanager.storage.path and -alertmanager.storage.retention. The deprecated config will be removed in Cortex 1.11. #4002
  • [CHANGE] Alertmanager: removed -cluster. CLI flags deprecated in Cortex 1.7. The new config options to use are: #3946
    • -alertmanager.cluster.listen-address instead of -cluster.listen-address
    • -alertmanager.cluster.advertise-address instead of -cluster.advertise-address
    • -alertmanager.cluster.peers instead of -cluster.peer
    • -alertmanager.cluster.peer-timeout instead of -cluster.peer-timeout
  • [CHANGE] Blocks storage: removed the config option -blocks-storage.bucket-store.index-cache.postings-compression-enabled, which was deprecated in Cortex 1.6. Postings compression is always enabled. #4101
  • [CHANGE] Querier: removed the config option -store.max-look-back-period, which was deprecated in Cortex 1.6 and was used only by the chunks storage. You should use -querier.max-query-lookback instead. #4101
  • [CHANGE] Query Frontend: removed the config option -querier.compress-http-responses, which was deprecated in Cortex 1.6. You should use-api.response-compression-enabled instead. #4101
  • [CHANGE] Runtime-config / overrides: removed the config options -limits.per-user-override-config (use -runtime-config.file) and -limits.per-user-override-period (use -runtime-config.reload-period), both deprecated since Cortex 0.6.0. #4112
  • [CHANGE] Cortex now fails fast on startup if unable to connect to the ring backend. #4068
  • [FEATURE] The following features have been marked as stable: #4101
    • Shuffle-sharding
    • Querier support for querying chunks and blocks store at the same time
    • Tracking of active series and exporting them as metrics (-ingester.active-series-metrics-enabled and related flags)
    • Blocks storage: lazy mmap of block indexes in the store-gateway (-blocks-storage.bucket-store.index-header-lazy-loading-enabled)
    • Ingester: close idle TSDB and remove them from local disk (-blocks-storage.tsdb.close-idle-tsdb-timeout)
  • [FEATURE] Memberlist: add TLS configuration options for the memberlist transport layer used by the gossip KV store. #4046
    • New flags added for memberlist communication:
      • -memberlist.tls-enabled
      • -memberlist.tls-cert-path
      • -memberlist.tls-key-path
      • -memberlist.tls-ca-path
      • -memberlist.tls-server-name
      • -memberlist.tls-insecure-skip-verify
  • [FEATURE] Ruler: added local backend support to the ruler storage configuration under the -ruler-storage. flag prefix. #3932
  • [ENHANCEMENT] Upgraded Docker base images to alpine:3.13. #4042
  • [ENHANCEMENT] Blocks storage: reduce ingester memory by eliminating series reference cache. #3951
  • [ENHANCEMENT] Ruler: optimized <prefix>/api/v1/rules and <prefix>/api/v1/alerts when ruler sharding is enabled. #3916
  • [ENHANCEMENT] Ruler: added the following metrics when ruler sharding is enabled: #3916
    • cortex_ruler_clients
    • cortex_ruler_client_request_duration_seconds
  • [ENHANCEMENT] Alertmanager: Add API endpoint to list all tenant alertmanager configs: GET /multitenant_alertmanager/configs. #3529
  • [ENHANCEMENT] Ruler: Add API endpoint to list all tenant ruler rule groups: GET /ruler/rule_groups. #3529
  • [ENHANCEMENT] Query-frontend/scheduler: added querier forget delay (-query-frontend.querier-forget-delay and -query-scheduler.querier-forget-delay) to mitigate the blast radius in the event queriers crash because of a repeatedly sent "query of death" when shuffle-sharding is enabled. #3901
  • [ENHANCEMENT] Query-frontend: reduced memory allocations when serializing query response. #3964
  • [ENHANCEMENT] Querier / ruler: some optimizations to PromQL query engine. #3934 #3989
  • [ENHANCEMENT] Ingester: reduce CPU and memory when an high number of errors are returned by the ingester on the write path with the blocks storage. #3969 #3971 #3973
  • [ENHANCEMENT] Distributor: reduce CPU and memory when an high number of errors are returned by the distributor on the write path. #3990
  • [ENHANCEMENT] Put metric before label value in the "label value too long" error message. #4018
  • [ENHANCEMENT] Allow use of y|w|d suffixes for duration related limits and per-tenant limits. #4044
  • [ENHANCEMENT] Query-frontend: Small optimization on top of PR #3968 to avoid unnecessary Extents merging. #4026
  • [ENHANCEMENT] Add a metric cortex_compactor_compaction_interval_seconds for the compaction interval config value. #4040
  • [ENHANCEMENT] Ingester: added following per-ingester (instance) experimental limits: max number of series in memory (-ingester.instance-limits.max-series), max number of users in memory (-ingester.instance-limits.max-tenants), max ingestion rate (-ingester.instance-limits.max-ingestion-rate), and max inflight requests (-ingester.instance-limits.max-inflight-push-requests). These limits are only used when using blocks storage. Limits can also be configured using runtime-config feature, and current values are exported as cortex_ingester_instance_limits metric. #3992.
  • [ENHANCEMENT] Cortex is now built with Go 1.16. #4062
  • [ENHANCEMENT] Distributor: added per-distributor experimental limits: max number of inflight requests (-distributor.instance-limits.max-inflight-push-requests) and max ingestion rate in samples/sec (-distributor.instance-limits.max-ingestion-rate). If not set, these two are unlimited. Also added metrics to expose current values (cortex_distributor_inflight_push_requests, cortex_distributor_ingestion_rate_samples_per_second) as well as limits (cortex_distributor_instance_limits with various limit label values). #4071
  • [ENHANCEMENT] Ruler: Added -ruler.enabled-tenants and -ruler.disabled-tenants to explicitly enable or disable rules processing for specific tenants. #4074
  • [ENHANCEMENT] Block Storage Ingester: /flush now accepts two new parameters: tenant to specify tenant to flush and wait=true to make call synchronous. Multiple tenants can be specified by repeating tenant parameter. If no tenant is specified, all tenants are flushed, as before. #4073
  • [ENHANCEMENT] Alertmanager: validate configured -alertmanager.web.external-url and fail if ends with /. #4081
  • [ENHANCEMENT] Alertmanager: added -alertmanager.receivers-firewall.block.cidr-networks and -alertmanager.receivers-firewall.block.private-addresses to block specific network addresses in HTTP-based Alertmanager receiver integrations. #4085
  • [ENHANCEMENT] Allow configuration of Cassandra's host selection policy. #4069
  • [ENHANCEMENT] Store-gateway: retry synching blocks if a per-tenant sync fails. #3975 #4088
  • [ENHANCEMENT] Add metric cortex_tcp_connections exposing the current number of accepted TCP connections. #4099
  • [ENHANCEMENT] Querier: Allow federated queries to run concurrently. #4065
  • [ENHANCEMENT] Label Values API call now supports match[] parameter when querying blocks on storage (assuming -querier.query-store-for-labels-enabled is enabled). #4133
  • [BUGFIX] Ruler-API: fix bug where /api/v1/rules/<namespace>/<group_name> endpoint return 400 instead of 404. #4013
  • [BUGFIX] Distributor: reverted changes done to rate limiting in #3825. #3948
  • [BUGFIX] Ingester: Fix race condition when opening and closing tsdb concurrently. #3959
  • [BUGFIX] Querier: streamline tracing spans. #3924
  • [BUGFIX] Ruler Storage: ignore objects with empty namespace or group in the name. #3999
  • [BUGFIX] Distributor: fix issue causing distributors to not extend the replication set because of failing instances when zone-aware replication is enabled. #3977
  • [BUGFIX] Query-frontend: Fix issue where cached entry size keeps increasing when making tiny query repeatedly. #3968
  • [BUGFIX] Compactor: -compactor.blocks-retention-period now supports weeks (w) and years (y). #4027
  • [BUGFIX] Querier: returning 422 (instead of 500) when query hits max_chunks_per_query limit with block storage, when the limit is hit in the store-gateway. #3937
  • [BUGFIX] Ruler: Rule group limit enforcement should now allow the same number of rules in a group as the limit. #3616
  • [BUGFIX] Frontend, Query-scheduler: allow querier to notify about shutdown without providing any authentication. #4066
  • [BUGFIX] Querier: fixed race condition causing queries to fail right after querier startup with the "empty ring" error. #4068
  • [BUGFIX] Compactor: Increment cortex_compactor_runs_failed_total if compactor failed compact a single tenant. #4094
  • [BUGFIX] Tracing: hot fix to avoid the Jaeger tracing client to indefinitely block the Cortex process shutdown in case the HTTP connection to the tracing backend is blocked. #4134
  • [BUGFIX] Forward proper EndsAt from ruler to Alertmanager inline with Prometheus behaviour. #4017

Blocksconvert

  • [ENHANCEMENT] Builder: add `-builder.timestamp-...
Read more