Skip to content

Releases: cortexproject/cortex

Cortex 1.3.0-rc.2

17 Aug 10:18
v1.3.0-rc.2
8d6a573
Compare
Choose a tag to compare
Cortex 1.3.0-rc.2 Pre-release
Pre-release

This is the third release candidate for Cortex 1.3.0, including a bug fix:

  • [BUGFIX] Querier: query /series from ingesters regardless the -querier.query-ingesters-within setting. #3035

Cortex 1.3.0-rc.1

10 Aug 13:33
v1.3.0-rc.1
0929895
Compare
Choose a tag to compare
Cortex 1.3.0-rc.1 Pre-release
Pre-release

This is the second release candidate for Cortex 1.3.0, including a bug fix and an improvement:

  • [ENHANCEMENT] Ingester: added Dropped outcome to metric cortex_ingester_flushing_dequeued_series_total. #2998
  • [BUGFIX] Ruler: fixed an unintentional breaking change introduced in the ruler's alertmanager_url YAML config option, which changed the value from a string to a list of strings. #2989

Cortex 1.3.0-rc.0

05 Aug 12:59
v1.3.0-rc.0
fff7ce5
Compare
Choose a tag to compare
Cortex 1.3.0-rc.0 Pre-release
Pre-release

This Cortex release features 125 contributions from 37 different authors. It's yet another great milestone we have reached thanks to the amazing support from our community ❤️ Thanks!

Highlights:

  • The blocks storage is getting closer to production readiness. In this release we've done several fixes and improvements. In particular, you should be aware of:
    • Some CLI flags and YAML config options have been renamed
    • The store-gateway service is now mandatory when running the blocks storage
    • Introduced support for a live cluster migration from chunks to blocks (and rollback)
    • Introduced support to flush blocks on-demand from ingesters
  • The ruler and alertmanager got several improvements, including but not limited to:
    • The ruler now runs in the single binary when Cortex gets started with -target=all
    • Introduced new config options to fine-tune the ruler
    • Introduced support to load locally stored rules (eg. loaded via Kubernetes config map)
    • Multiple alertmanager URLs can now be specified in the ruler; each URL is treated as a separate alertmanager group
    • Alertmanager configuration can be persisted to object storage via API
  • Other changes worth to note:
    • Added optional snappy compression support to internal gRPC connections
    • Starting from this release we're going to publish .rpm and .deb packages too

Please refer to the full changelog for full list of changes and improvements.

Changelog

  • [CHANGE] Replace the metric cortex_alertmanager_configs with cortex_alertmanager_config_invalid exposed by Alertmanager. #2960
  • [CHANGE] Experimental Delete Series: Change target flag for purger from data-purger to purger. #2777
  • [CHANGE] Experimental blocks storage: The max concurrent queries against the long-term storage, configured via -experimental.blocks-storage.bucket-store.max-concurrent, is now a limit shared across all tenants and not a per-tenant limit anymore. The default value has changed from 20 to 100 and the following new metrics have been added: #2797
    • cortex_bucket_stores_gate_queries_concurrent_max
    • cortex_bucket_stores_gate_queries_in_flight
    • cortex_bucket_stores_gate_duration_seconds
  • [CHANGE] Metric cortex_ingester_flush_reasons has been renamed to cortex_ingester_flushing_enqueued_series_total, and new metric cortex_ingester_flushing_dequeued_series_total with outcome label (superset of reason) has been added. #2802, #2818
  • [CHANGE] Experimental Delete Series: Metric cortex_purger_oldest_pending_delete_request_age_seconds would track age of delete requests since they are over their cancellation period instead of their creation time. #2806
  • [CHANGE] Experimental blocks storage: the store-gateway service is required in a Cortex cluster running with the experimental blocks storage. Removed the -experimental.tsdb.store-gateway-enabled CLI flag and store_gateway_enabled YAML config option. The store-gateway is now always enabled when the storage engine is blocks. #2822
  • [CHANGE] Experimental blocks storage: removed support for -experimental.blocks-storage.bucket-store.max-sample-count flag because the implementation was flawed. To limit the number of samples/chunks processed by a single query you can set -store.query-chunk-limit, which is now supported by the blocks storage too. #2852
  • [CHANGE] Ingester: Chunks flushed via /flush stay in memory until retention period is reached. This affects cortex_ingester_memory_chunks metric. #2778
  • [CHANGE] Querier: the error message returned when the query time range exceeds -store.max-query-length has changed from invalid query, length > limit (X > Y) to the query time range exceeds the limit (query length: X, limit: Y). #2826
  • [CHANGE] Add component label to metrics exposed by chunk, delete and index store clients. #2774
  • [CHANGE] Querier: when -querier.query-ingesters-within is configured, the time range of the query sent to ingesters is now manipulated to ensure the query start time is not older than 'now - query-ingesters-within'. #2904
  • [CHANGE] KV: The role label which was a label of multi KV store client only has been added to metrics of every KV store client. If KV store client is not multi, then the value of role label is primary. #2837
  • [CHANGE] Added the engine label to the metrics exposed by the Prometheus query engine, to distinguish between ruler and querier metrics. #2854
  • [CHANGE] Added ruler to the single binary when started with -target=all (default). #2854
  • [CHANGE] Experimental blocks storage: compact head when opening TSDB. This should only affect ingester startup after it was unable to compact head in previous run. #2870
  • [CHANGE] Metric cortex_overrides_last_reload_successful has been renamed to cortex_runtime_config_last_reload_successful. #2874
  • [CHANGE] HipChat support has been removed from the alertmanager (because removed from the Prometheus upstream too). #2902
  • [CHANGE] Add constant label name to metric cortex_cache_request_duration_seconds. #2903
  • [CHANGE] Add user label to metric cortex_query_frontend_queue_length. #2939
  • [CHANGE] Experimental blocks storage: cleaned up the config and renamed "TSDB" to "blocks storage". #2937
    • The storage engine setting value has been changed from tsdb to blocks; this affects -store.engine CLI flag and its respective YAML option.
    • The root level YAML config has changed from tsdb to blocks_storage
    • The prefix of all CLI flags has changed from -experimental.tsdb. to -experimental.blocks-storage.
    • The following settings have been grouped under tsdb property in the YAML config and their CLI flags changed:
      • -experimental.tsdb.dir changed to -experimental.blocks-storage.tsdb.dir
      • -experimental.tsdb.block-ranges-period changed to -experimental.blocks-storage.tsdb.block-ranges-period
      • -experimental.tsdb.retention-period changed to -experimental.blocks-storage.tsdb.retention-period
      • -experimental.tsdb.ship-interval changed to -experimental.blocks-storage.tsdb.ship-interval
      • -experimental.tsdb.ship-concurrency changed to -experimental.blocks-storage.tsdb.ship-concurrency
      • -experimental.tsdb.max-tsdb-opening-concurrency-on-startup changed to -experimental.blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup
      • -experimental.tsdb.head-compaction-interval changed to -experimental.blocks-storage.tsdb.head-compaction-interval
      • -experimental.tsdb.head-compaction-concurrency changed to -experimental.blocks-storage.tsdb.head-compaction-concurrency
      • -experimental.tsdb.head-compaction-idle-timeout changed to -experimental.blocks-storage.tsdb.head-compaction-idle-timeout
      • -experimental.tsdb.stripe-size changed to -experimental.blocks-storage.tsdb.stripe-size
      • -experimental.tsdb.wal-compression-enabled changed to -experimental.blocks-storage.tsdb.wal-compression-enabled
      • -experimental.tsdb.flush-blocks-on-shutdown changed to -experimental.blocks-storage.tsdb.flush-blocks-on-shutdown
  • [CHANGE] Flags -bigtable.grpc-use-gzip-compression, -ingester.client.grpc-use-gzip-compression, -querier.frontend-client.grpc-use-gzip-compression are now deprecated. #2940
  • [CHANGE] Limit errors reported by ingester during query-time now return HTTP status code 422. #2941
  • [FEATURE] Introduced ruler.for-outage-tolerance, Max time to tolerate outage for restoring "for" state of alert. #2783
  • [FEATURE] Introduced ruler.for-grace-period, Minimum duration between alert and restored "for" state. This is maintained only for alerts with configured "for" time greater than grace period. #2783
  • [FEATURE] Introduced ruler.resend-delay, Minimum amount of time to wait before resending an alert to Alertmanager. #2783
  • [FEATURE] Ruler: added local filesystem support to store rules (read-only). #2854
  • [ENHANCEMENT] Upgraded Docker base images to alpine:3.12. #2862
  • [ENHANCEMENT] Experimental: Querier can now optionally query secondary store. This is specified by using -querier.second-store-engine option, with values chunks or blocks. Standard configuration options for this store are used. Additionally, this querying can be configured to happen only for queries that need data older than -querier.use-second-store-before-time. Default value of zero will always query secondary store. #2747
  • [ENHANCEMENT] Query-tee: increased the cortex_querytee_request_duration_seconds metric buckets granularity. #2799
  • [ENHANCEMENT] Query-tee: fail to start if the configured -backend.preferred is unknown. #2799
  • [ENHANCEMENT] Ruler: Added the following metrics: #2786
    • cortex_prometheus_notifications_latency_seconds
    • cortex_prometheus_notifications_errors_total
    • cortex_prometheus_notifications_sent_total
    • cortex_prometheus_notifications_dropped_total
    • cortex_prometheus_notifications_queue_length
    • cortex_prometheus_notifications_queue_capacity
    • cortex_prometheus_notifications_alertmanagers_discovered
  • [ENHANCEMENT] The behavior of the /ready was changed for the query frontend to indicate when it was ready to accept queries. This is intended for use by a read path load balancer that would want to wait for the frontend to have attached queriers before including it in the backend. #2733
  • [ENHANCEMENT] Experimental Delete Series: Add support for deletion of chunks for remaining stores. #2801
  • [ENHANCEMENT] Add -modules command line flag to list possible values for -target. Also, log warning if given target is internal component. #2752
  • [ENHANCEMENT] Added -ingester.flush-on-shutdown-with-wal-enabled option to enable chunks flushing even when WAL is enabled. #2780
  • [ENHANCEMENT] Query-tee: Support for custom API prefix by using -server.path-prefix option. #2814
  • [ENHANCEMENT] Query-tee: Forward `X-...
Read more

Cortex 1.2.0

01 Jul 17:20
cd9e38d
Compare
Choose a tag to compare

This release has a number of bug-fixes and enhancements, particularly:

  • Memberlist KV client is no longer considered experimental. #2725
  • 3rd-party index and chunk stores using gRPC client/server plugin mechanism (experimental) #2220
  • Using an invalid flag no longer causes printing of all available flags. #2691 (my favourite change!)

Many thanks to all contributors.

Detailed list of changes:

  • [CHANGE] Metric cortex_kv_request_duration_seconds now includes name label to denote which client is being used as well as the backend label to denote the KV backend implementation in use. #2648
  • [CHANGE] Experimental Ruler: Rule groups persisted to object storage using the experimental API have an updated object key encoding to better handle special characters. Rule groups previously-stored using object storage must be renamed to the new format. #2646
  • [CHANGE] Query Frontend now uses Round Robin to choose a tenant queue to service next. #2553
  • [CHANGE] -promql.lookback-delta is now deprecated and has been replaced by -querier.lookback-delta along with lookback_delta entry under querier in the config file. -promql.lookback-delta will be removed in v1.4.0. #2604
  • [CHANGE] Experimental TSDB: removed -experimental.tsdb.bucket-store.binary-index-header-enabled flag. Now the binary index-header is always enabled.
  • [CHANGE] Experimental TSDB: Renamed index-cache metrics to use original metric names from Thanos, as Cortex is not aggregating them in any way: #2627
    • cortex_<service>_blocks_index_cache_items_evicted_total => thanos_store_index_cache_items_evicted_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_items_added_total => thanos_store_index_cache_items_added_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_requests_total => thanos_store_index_cache_requests_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_items_overflowed_total => thanos_store_index_cache_items_overflowed_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_hits_total => thanos_store_index_cache_hits_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_items => thanos_store_index_cache_items{name="index-cache"}
    • cortex_<service>_blocks_index_cache_items_size_bytes => thanos_store_index_cache_items_size_bytes{name="index-cache"}
    • cortex_<service>_blocks_index_cache_total_size_bytes => thanos_store_index_cache_total_size_bytes{name="index-cache"}
    • cortex_<service>_blocks_index_cache_memcached_operations_total => thanos_memcached_operations_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_memcached_operation_failures_total => thanos_memcached_operation_failures_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_memcached_operation_duration_seconds => thanos_memcached_operation_duration_seconds{name="index-cache"}
    • cortex_<service>_blocks_index_cache_memcached_operation_skipped_total => thanos_memcached_operation_skipped_total{name="index-cache"}
  • [CHANGE] Experimental TSDB: Renamed metrics in bucket stores: #2627
    • cortex_<service>_blocks_meta_syncs_total => cortex_blocks_meta_syncs_total{component="<service>"}
    • cortex_<service>_blocks_meta_sync_failures_total => cortex_blocks_meta_sync_failures_total{component="<service>"}
    • cortex_<service>_blocks_meta_sync_duration_seconds => cortex_blocks_meta_sync_duration_seconds{component="<service>"}
    • cortex_<service>_blocks_meta_sync_consistency_delay_seconds => cortex_blocks_meta_sync_consistency_delay_seconds{component="<service>"}
    • cortex_<service>_blocks_meta_synced => cortex_blocks_meta_synced{component="<service>"}
    • cortex_<service>_bucket_store_block_loads_total => cortex_bucket_store_block_loads_total{component="<service>"}
    • cortex_<service>_bucket_store_block_load_failures_total => cortex_bucket_store_block_load_failures_total{component="<service>"}
    • cortex_<service>_bucket_store_block_drops_total => cortex_bucket_store_block_drops_total{component="<service>"}
    • cortex_<service>_bucket_store_block_drop_failures_total => cortex_bucket_store_block_drop_failures_total{component="<service>"}
    • cortex_<service>_bucket_store_blocks_loaded => cortex_bucket_store_blocks_loaded{component="<service>"}
    • cortex_<service>_bucket_store_series_data_touched => cortex_bucket_store_series_data_touched{component="<service>"}
    • cortex_<service>_bucket_store_series_data_fetched => cortex_bucket_store_series_data_fetched{component="<service>"}
    • cortex_<service>_bucket_store_series_data_size_touched_bytes => cortex_bucket_store_series_data_size_touched_bytes{component="<service>"}
    • cortex_<service>_bucket_store_series_data_size_fetched_bytes => cortex_bucket_store_series_data_size_fetched_bytes{component="<service>"}
    • cortex_<service>_bucket_store_series_blocks_queried => cortex_bucket_store_series_blocks_queried{component="<service>"}
    • cortex_<service>_bucket_store_series_get_all_duration_seconds => cortex_bucket_store_series_get_all_duration_seconds{component="<service>"}
    • cortex_<service>_bucket_store_series_merge_duration_seconds => cortex_bucket_store_series_merge_duration_seconds{component="<service>"}
    • cortex_<service>_bucket_store_series_refetches_total => cortex_bucket_store_series_refetches_total{component="<service>"}
    • cortex_<service>_bucket_store_series_result_series => cortex_bucket_store_series_result_series{component="<service>"}
    • cortex_<service>_bucket_store_cached_postings_compressions_total => cortex_bucket_store_cached_postings_compressions_total{component="<service>"}
    • cortex_<service>_bucket_store_cached_postings_compression_errors_total => cortex_bucket_store_cached_postings_compression_errors_total{component="<service>"}
    • cortex_<service>_bucket_store_cached_postings_compression_time_seconds => cortex_bucket_store_cached_postings_compression_time_seconds{component="<service>"}
    • cortex_<service>_bucket_store_cached_postings_original_size_bytes_total => cortex_bucket_store_cached_postings_original_size_bytes_total{component="<service>"}
    • cortex_<service>_bucket_store_cached_postings_compressed_size_bytes_total => cortex_bucket_store_cached_postings_compressed_size_bytes_total{component="<service>"}
    • cortex_<service>_blocks_sync_seconds => cortex_bucket_stores_blocks_sync_seconds{component="<service>"}
    • cortex_<service>_blocks_last_successful_sync_timestamp_seconds => cortex_bucket_stores_blocks_last_successful_sync_timestamp_seconds{component="<service>"}
  • [CHANGE] Available command-line flags are printed to stdout, and only when requested via -help. Using invalid flag no longer causes printing of all available flags. #2691
  • [CHANGE] Experimental Memberlist ring: randomize gossip node names to avoid conflicts when running multiple clients on the same host, or reusing host names (eg. pods in statefulset). Node name randomization can be disabled by using -memberlist.randomize-node-name=false. #2715
  • [CHANGE] Memberlist KV client is no longer considered experimental. #2725
  • [CHANGE] Experimental Delete Series: Make delete request cancellation duration configurable. #2760
  • [CHANGE] Removed -store.fullsize-chunks option which was undocumented and unused (it broke ingester hand-overs). #2656
  • [CHANGE] Query with no metric name that has previously resulted in HTTP status code 500 now returns status code 422 instead. #2571
  • [FEATURE] TLS config options added for GRPC clients in Querier (Query-frontend client & Ingester client), Ruler, Store Gateway, as well as HTTP client in Config store client. #2502
  • [FEATURE] The flag -frontend.max-cache-freshness is now supported within the limits overrides, to specify per-tenant max cache freshness values. The corresponding YAML config parameter has been changed from results_cache.max_freshness to limits_config.max_cache_freshness. The legacy YAML config parameter (results_cache.max_freshness) will continue to be supported till Cortex release v1.4.0. #2609
  • [FEATURE] Experimental gRPC Store: Added support to 3rd parties index and chunk stores using gRPC client/server plugin mechanism. #2220
  • [FEATURE] Add -cassandra.table-options flag to customize table options of Cassandra when creating the index or chunk table. #2575
  • [ENHANCEMENT] Propagate GOPROXY value when building build-image. This is to help the builders building the code in a Network where default Go proxy is not accessible (e.g. when behind some corporate VPN). #2741
  • [ENHANCEMENT] Querier: Added metric cortex_querier_request_duration_seconds for all requests to the querier. #2708
  • [ENHANCEMENT] Cortex is now built with Go 1.14. #2480 #2749 #2753
  • [ENHANCEMENT] Experimental TSDB: added the following metrics to the ingester: #2580 #2583 #2589 #2654
    • cortex_ingester_tsdb_appender_add_duration_seconds
    • cortex_ingester_tsdb_appender_commit_duration_seconds
    • cortex_ingester_tsdb_refcache_purge_duration_seconds
    • cortex_ingester_tsdb_compactions_total
    • cortex_ingester_tsdb_compaction_duration_seconds
    • cortex_ingester_tsdb_wal_fsync_duration_seconds
    • cortex_ingester_tsdb_wal_page_flushes_total
    • cortex_ingester_tsdb_wal_completed_pages_total
    • cortex_ingester_tsdb_wal_truncations_failed_total
    • cortex_ingester_tsdb_wal_truncations_total
    • cortex_ingester_tsdb_wal_writes_failed_total
    • cortex_ingester_tsdb_checkpoint_deletions_failed_total
    • cortex_ingester_tsdb_checkpoint_deletions_total
    • cortex_ingester_tsdb_checkpoint_creations_failed_total
    • cortex_ingester_tsdb_checkpoint_creations_total
    • cortex_ingester_tsdb_wal_truncate_duration_seconds
    • cortex_ingester_tsdb_head_active_appenders
    • cortex_ingester_tsdb_head_series_not_found_total
    • `co...
Read more

Cortex 1.2.0-rc.1

01 Jul 10:05
Compare
Choose a tag to compare
Cortex 1.2.0-rc.1 Pre-release
Pre-release

RC1 has one bugfix over RC0: #2796

This release has a number of bug-fixes and enhancements, particularly:

  • Memberlist KV client is no longer considered experimental. #2725
  • 3rd-party index and chunk stores using gRPC client/server plugin mechanism (experimental) #2220
  • Using an invalid flag no longer causes printing of all available flags. #2691 (my favourite change!)

Many thanks to all contributors.

Detailed list of changes:

  • [CHANGE] Metric cortex_kv_request_duration_seconds now includes name label to denote which client is being used as well as the backend label to denote the KV backend implementation in use. #2648
  • [CHANGE] Experimental Ruler: Rule groups persisted to object storage using the experimental API have an updated object key encoding to better handle special characters. Rule groups previously-stored using object storage must be renamed to the new format. #2646
  • [CHANGE] Query Frontend now uses Round Robin to choose a tenant queue to service next. #2553
  • [CHANGE] -promql.lookback-delta is now deprecated and has been replaced by -querier.lookback-delta along with lookback_delta entry under querier in the config file. -promql.lookback-delta will be removed in v1.4.0. #2604
  • [CHANGE] Experimental TSDB: removed -experimental.tsdb.bucket-store.binary-index-header-enabled flag. Now the binary index-header is always enabled.
  • [CHANGE] Experimental TSDB: Renamed index-cache metrics to use original metric names from Thanos, as Cortex is not aggregating them in any way: #2627
    • cortex_<service>_blocks_index_cache_items_evicted_total => thanos_store_index_cache_items_evicted_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_items_added_total => thanos_store_index_cache_items_added_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_requests_total => thanos_store_index_cache_requests_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_items_overflowed_total => thanos_store_index_cache_items_overflowed_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_hits_total => thanos_store_index_cache_hits_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_items => thanos_store_index_cache_items{name="index-cache"}
    • cortex_<service>_blocks_index_cache_items_size_bytes => thanos_store_index_cache_items_size_bytes{name="index-cache"}
    • cortex_<service>_blocks_index_cache_total_size_bytes => thanos_store_index_cache_total_size_bytes{name="index-cache"}
    • cortex_<service>_blocks_index_cache_memcached_operations_total => thanos_memcached_operations_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_memcached_operation_failures_total => thanos_memcached_operation_failures_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_memcached_operation_duration_seconds => thanos_memcached_operation_duration_seconds{name="index-cache"}
    • cortex_<service>_blocks_index_cache_memcached_operation_skipped_total => thanos_memcached_operation_skipped_total{name="index-cache"}
  • [CHANGE] Experimental TSDB: Renamed metrics in bucket stores: #2627
    • cortex_<service>_blocks_meta_syncs_total => cortex_blocks_meta_syncs_total{component="<service>"}
    • cortex_<service>_blocks_meta_sync_failures_total => cortex_blocks_meta_sync_failures_total{component="<service>"}
    • cortex_<service>_blocks_meta_sync_duration_seconds => cortex_blocks_meta_sync_duration_seconds{component="<service>"}
    • cortex_<service>_blocks_meta_sync_consistency_delay_seconds => cortex_blocks_meta_sync_consistency_delay_seconds{component="<service>"}
    • cortex_<service>_blocks_meta_synced => cortex_blocks_meta_synced{component="<service>"}
    • cortex_<service>_bucket_store_block_loads_total => cortex_bucket_store_block_loads_total{component="<service>"}
    • cortex_<service>_bucket_store_block_load_failures_total => cortex_bucket_store_block_load_failures_total{component="<service>"}
    • cortex_<service>_bucket_store_block_drops_total => cortex_bucket_store_block_drops_total{component="<service>"}
    • cortex_<service>_bucket_store_block_drop_failures_total => cortex_bucket_store_block_drop_failures_total{component="<service>"}
    • cortex_<service>_bucket_store_blocks_loaded => cortex_bucket_store_blocks_loaded{component="<service>"}
    • cortex_<service>_bucket_store_series_data_touched => cortex_bucket_store_series_data_touched{component="<service>"}
    • cortex_<service>_bucket_store_series_data_fetched => cortex_bucket_store_series_data_fetched{component="<service>"}
    • cortex_<service>_bucket_store_series_data_size_touched_bytes => cortex_bucket_store_series_data_size_touched_bytes{component="<service>"}
    • cortex_<service>_bucket_store_series_data_size_fetched_bytes => cortex_bucket_store_series_data_size_fetched_bytes{component="<service>"}
    • cortex_<service>_bucket_store_series_blocks_queried => cortex_bucket_store_series_blocks_queried{component="<service>"}
    • cortex_<service>_bucket_store_series_get_all_duration_seconds => cortex_bucket_store_series_get_all_duration_seconds{component="<service>"}
    • cortex_<service>_bucket_store_series_merge_duration_seconds => cortex_bucket_store_series_merge_duration_seconds{component="<service>"}
    • cortex_<service>_bucket_store_series_refetches_total => cortex_bucket_store_series_refetches_total{component="<service>"}
    • cortex_<service>_bucket_store_series_result_series => cortex_bucket_store_series_result_series{component="<service>"}
    • cortex_<service>_bucket_store_cached_postings_compressions_total => cortex_bucket_store_cached_postings_compressions_total{component="<service>"}
    • cortex_<service>_bucket_store_cached_postings_compression_errors_total => cortex_bucket_store_cached_postings_compression_errors_total{component="<service>"}
    • cortex_<service>_bucket_store_cached_postings_compression_time_seconds => cortex_bucket_store_cached_postings_compression_time_seconds{component="<service>"}
    • cortex_<service>_bucket_store_cached_postings_original_size_bytes_total => cortex_bucket_store_cached_postings_original_size_bytes_total{component="<service>"}
    • cortex_<service>_bucket_store_cached_postings_compressed_size_bytes_total => cortex_bucket_store_cached_postings_compressed_size_bytes_total{component="<service>"}
    • cortex_<service>_blocks_sync_seconds => cortex_bucket_stores_blocks_sync_seconds{component="<service>"}
    • cortex_<service>_blocks_last_successful_sync_timestamp_seconds => cortex_bucket_stores_blocks_last_successful_sync_timestamp_seconds{component="<service>"}
  • [CHANGE] Available command-line flags are printed to stdout, and only when requested via -help. Using invalid flag no longer causes printing of all available flags. #2691
  • [CHANGE] Experimental Memberlist ring: randomize gossip node names to avoid conflicts when running multiple clients on the same host, or reusing host names (eg. pods in statefulset). Node name randomization can be disabled by using -memberlist.randomize-node-name=false. #2715
  • [CHANGE] Memberlist KV client is no longer considered experimental. #2725
  • [CHANGE] Experimental Delete Series: Make delete request cancellation duration configurable. #2760
  • [CHANGE] Removed -store.fullsize-chunks option which was undocumented and unused (it broke ingester hand-overs). #2656
  • [CHANGE] Query with no metric name that has previously resulted in HTTP status code 500 now returns status code 422 instead. #2571
  • [FEATURE] TLS config options added for GRPC clients in Querier (Query-frontend client & Ingester client), Ruler, Store Gateway, as well as HTTP client in Config store client. #2502
  • [FEATURE] The flag -frontend.max-cache-freshness is now supported within the limits overrides, to specify per-tenant max cache freshness values. The corresponding YAML config parameter has been changed from results_cache.max_freshness to limits_config.max_cache_freshness. The legacy YAML config parameter (results_cache.max_freshness) will continue to be supported till Cortex release v1.4.0. #2609
  • [FEATURE] Experimental gRPC Store: Added support to 3rd parties index and chunk stores using gRPC client/server plugin mechanism. #2220
  • [FEATURE] Add -cassandra.table-options flag to customize table options of Cassandra when creating the index or chunk table. #2575
  • [ENHANCEMENT] Propagate GOPROXY value when building build-image. This is to help the builders building the code in a Network where default Go proxy is not accessible (e.g. when behind some corporate VPN). #2741
  • [ENHANCEMENT] Querier: Added metric cortex_querier_request_duration_seconds for all requests to the querier. #2708
  • [ENHANCEMENT] Cortex is now built with Go 1.14. #2480 #2749 #2753
  • [ENHANCEMENT] Experimental TSDB: added the following metrics to the ingester: #2580 #2583 #2589 #2654
    • cortex_ingester_tsdb_appender_add_duration_seconds
    • cortex_ingester_tsdb_appender_commit_duration_seconds
    • cortex_ingester_tsdb_refcache_purge_duration_seconds
    • cortex_ingester_tsdb_compactions_total
    • cortex_ingester_tsdb_compaction_duration_seconds
    • cortex_ingester_tsdb_wal_fsync_duration_seconds
    • cortex_ingester_tsdb_wal_page_flushes_total
    • cortex_ingester_tsdb_wal_completed_pages_total
    • cortex_ingester_tsdb_wal_truncations_failed_total
    • cortex_ingester_tsdb_wal_truncations_total
    • cortex_ingester_tsdb_wal_writes_failed_total
    • cortex_ingester_tsdb_checkpoint_deletions_failed_total
    • cortex_ingester_tsdb_checkpoint_deletions_total
    • cortex_ingester_tsdb_checkpoint_creations_failed_total
    • cortex_ingester_tsdb_checkpoint_creations_total
    • cortex_ingester_tsdb_wal_truncate_duration_seconds
    • cortex_ingester_tsdb_head_active_appenders
    • `cortex_ingester_tsd...
Read more

Cortex 1.2.0-rc.0

24 Jun 13:48
fe97558
Compare
Choose a tag to compare
Cortex 1.2.0-rc.0 Pre-release
Pre-release

This release has a number of bug-fixes and enhancements, particularly:

  • Memberlist KV client is no longer considered experimental. #2725
  • 3rd-party index and chunk stores using gRPC client/server plugin mechanism (experimental) #2220
  • Using an invalid flag no longer causes printing of all available flags. #2691 (my favourite change!)

Many thanks to all contributors.

Detailed list of changes:

  • [CHANGE] Metric cortex_kv_request_duration_seconds now includes name label to denote which client is being used as well as the backend label to denote the KV backend implementation in use. #2648
  • [CHANGE] Experimental Ruler: Rule groups persisted to object storage using the experimental API have an updated object key encoding to better handle special characters. Rule groups previously-stored using object storage must be renamed to the new format. #2646
  • [CHANGE] Query Frontend now uses Round Robin to choose a tenant queue to service next. #2553
  • [CHANGE] -promql.lookback-delta is now deprecated and has been replaced by -querier.lookback-delta along with lookback_delta entry under querier in the config file. -promql.lookback-delta will be removed in v1.4.0. #2604
  • [CHANGE] Experimental TSDB: removed -experimental.tsdb.bucket-store.binary-index-header-enabled flag. Now the binary index-header is always enabled.
  • [CHANGE] Experimental TSDB: Renamed index-cache metrics to use original metric names from Thanos, as Cortex is not aggregating them in any way: #2627
    • cortex_<service>_blocks_index_cache_items_evicted_total => thanos_store_index_cache_items_evicted_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_items_added_total => thanos_store_index_cache_items_added_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_requests_total => thanos_store_index_cache_requests_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_items_overflowed_total => thanos_store_index_cache_items_overflowed_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_hits_total => thanos_store_index_cache_hits_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_items => thanos_store_index_cache_items{name="index-cache"}
    • cortex_<service>_blocks_index_cache_items_size_bytes => thanos_store_index_cache_items_size_bytes{name="index-cache"}
    • cortex_<service>_blocks_index_cache_total_size_bytes => thanos_store_index_cache_total_size_bytes{name="index-cache"}
    • cortex_<service>_blocks_index_cache_memcached_operations_total => thanos_memcached_operations_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_memcached_operation_failures_total => thanos_memcached_operation_failures_total{name="index-cache"}
    • cortex_<service>_blocks_index_cache_memcached_operation_duration_seconds => thanos_memcached_operation_duration_seconds{name="index-cache"}
    • cortex_<service>_blocks_index_cache_memcached_operation_skipped_total => thanos_memcached_operation_skipped_total{name="index-cache"}
  • [CHANGE] Experimental TSDB: Renamed metrics in bucket stores: #2627
    • cortex_<service>_blocks_meta_syncs_total => cortex_blocks_meta_syncs_total{component="<service>"}
    • cortex_<service>_blocks_meta_sync_failures_total => cortex_blocks_meta_sync_failures_total{component="<service>"}
    • cortex_<service>_blocks_meta_sync_duration_seconds => cortex_blocks_meta_sync_duration_seconds{component="<service>"}
    • cortex_<service>_blocks_meta_sync_consistency_delay_seconds => cortex_blocks_meta_sync_consistency_delay_seconds{component="<service>"}
    • cortex_<service>_blocks_meta_synced => cortex_blocks_meta_synced{component="<service>"}
    • cortex_<service>_bucket_store_block_loads_total => cortex_bucket_store_block_loads_total{component="<service>"}
    • cortex_<service>_bucket_store_block_load_failures_total => cortex_bucket_store_block_load_failures_total{component="<service>"}
    • cortex_<service>_bucket_store_block_drops_total => cortex_bucket_store_block_drops_total{component="<service>"}
    • cortex_<service>_bucket_store_block_drop_failures_total => cortex_bucket_store_block_drop_failures_total{component="<service>"}
    • cortex_<service>_bucket_store_blocks_loaded => cortex_bucket_store_blocks_loaded{component="<service>"}
    • cortex_<service>_bucket_store_series_data_touched => cortex_bucket_store_series_data_touched{component="<service>"}
    • cortex_<service>_bucket_store_series_data_fetched => cortex_bucket_store_series_data_fetched{component="<service>"}
    • cortex_<service>_bucket_store_series_data_size_touched_bytes => cortex_bucket_store_series_data_size_touched_bytes{component="<service>"}
    • cortex_<service>_bucket_store_series_data_size_fetched_bytes => cortex_bucket_store_series_data_size_fetched_bytes{component="<service>"}
    • cortex_<service>_bucket_store_series_blocks_queried => cortex_bucket_store_series_blocks_queried{component="<service>"}
    • cortex_<service>_bucket_store_series_get_all_duration_seconds => cortex_bucket_store_series_get_all_duration_seconds{component="<service>"}
    • cortex_<service>_bucket_store_series_merge_duration_seconds => cortex_bucket_store_series_merge_duration_seconds{component="<service>"}
    • cortex_<service>_bucket_store_series_refetches_total => cortex_bucket_store_series_refetches_total{component="<service>"}
    • cortex_<service>_bucket_store_series_result_series => cortex_bucket_store_series_result_series{component="<service>"}
    • cortex_<service>_bucket_store_cached_postings_compressions_total => cortex_bucket_store_cached_postings_compressions_total{component="<service>"}
    • cortex_<service>_bucket_store_cached_postings_compression_errors_total => cortex_bucket_store_cached_postings_compression_errors_total{component="<service>"}
    • cortex_<service>_bucket_store_cached_postings_compression_time_seconds => cortex_bucket_store_cached_postings_compression_time_seconds{component="<service>"}
    • cortex_<service>_bucket_store_cached_postings_original_size_bytes_total => cortex_bucket_store_cached_postings_original_size_bytes_total{component="<service>"}
    • cortex_<service>_bucket_store_cached_postings_compressed_size_bytes_total => cortex_bucket_store_cached_postings_compressed_size_bytes_total{component="<service>"}
    • cortex_<service>_blocks_sync_seconds => cortex_bucket_stores_blocks_sync_seconds{component="<service>"}
    • cortex_<service>_blocks_last_successful_sync_timestamp_seconds => cortex_bucket_stores_blocks_last_successful_sync_timestamp_seconds{component="<service>"}
  • [CHANGE] Available command-line flags are printed to stdout, and only when requested via -help. Using invalid flag no longer causes printing of all available flags. #2691
  • [CHANGE] Experimental Memberlist ring: randomize gossip node names to avoid conflicts when running multiple clients on the same host, or reusing host names (eg. pods in statefulset). Node name randomization can be disabled by using -memberlist.randomize-node-name=false. #2715
  • [CHANGE] Memberlist KV client is no longer considered experimental. #2725
  • [CHANGE] Change target flag for purger from data-purger to purger and make delete request cancellation duration configurable. #2760
  • [CHANGE] Removed -store.fullsize-chunks option which was undocumented and unused (it broke ingester hand-overs). #2656
  • [CHANGE] Query with no metric name that has previously resulted in HTTP status code 500 now returns status code 422 instead. #2571
  • [FEATURE] TLS config options added for GRPC clients in Querier (Query-frontend client & Ingester client), Ruler, Store Gateway, as well as HTTP client in Config store client. #2502
  • [FEATURE] The flag -frontend.max-cache-freshness is now supported within the limits overrides, to specify per-tenant max cache freshness values. The corresponding YAML config parameter has been changed from results_cache.max_freshness to limits_config.max_cache_freshness. The legacy YAML config parameter (results_cache.max_freshness) will continue to be supported till Cortex release v1.4.0. #2609
  • [FEATURE] Experimental gRPC Store: Added support to 3rd parties index and chunk stores using gRPC client/server plugin mechanism. #2220
  • [ENHANCEMENT] Propagate GOPROXY value when building build-image. This is to help the builders building the code in a Network where default Go proxy is not accessible (e.g. when behind some corporate VPN). #2741
  • [ENHANCEMENT] Querier: Added metric cortex_querier_request_duration_seconds for all requests to the querier. #2708
  • [ENHANCEMENT] Cortex is now built with Go 1.14. #2480 #2749 #2753
  • [ENHANCEMENT] Experimental TSDB: added the following metrics to the ingester: #2580 #2583 #2589 #2654
    • cortex_ingester_tsdb_appender_add_duration_seconds
    • cortex_ingester_tsdb_appender_commit_duration_seconds
    • cortex_ingester_tsdb_refcache_purge_duration_seconds
    • cortex_ingester_tsdb_compactions_total
    • cortex_ingester_tsdb_compaction_duration_seconds
    • cortex_ingester_tsdb_wal_fsync_duration_seconds
    • cortex_ingester_tsdb_wal_page_flushes_total
    • cortex_ingester_tsdb_wal_completed_pages_total
    • cortex_ingester_tsdb_wal_truncations_failed_total
    • cortex_ingester_tsdb_wal_truncations_total
    • cortex_ingester_tsdb_wal_writes_failed_total
    • cortex_ingester_tsdb_checkpoint_deletions_failed_total
    • cortex_ingester_tsdb_checkpoint_deletions_total
    • cortex_ingester_tsdb_checkpoint_creations_failed_total
    • cortex_ingester_tsdb_checkpoint_creations_total
    • cortex_ingester_tsdb_wal_truncate_duration_seconds
    • cortex_ingester_tsdb_head_active_appenders
    • cortex_ingester_tsdb_head_series_not_found_total
    • cortex_ingester_tsdb_head_chunks
    • cortex_ingester_tsdb_mmap_chunk_corruptions_total
    • `cort...
Read more

1.1.0 / 2020-05-21

21 May 09:45
v1.1.0
9db28c1
Compare
Choose a tag to compare

This release brings the usual mix of bugfixes and improvements. The biggest change is that WAL support for chunks is now considered to be production-ready!

Please make sure to review renamed metrics, and update your dashboards and alerts accordingly.

  • [CHANGE] Added v1 API routes documented in #2327. #2372
    • Added -http.alertmanager-http-prefix flag which allows the configuration of the path where the Alertmanager API and UI can be reached. The default is set to /alertmanager.
    • Added -http.prometheus-http-prefix flag which allows the configuration of the path where the Prometheus API and UI can be reached. The default is set to /prometheus.
    • Updated the index hosted at the root prefix to point to the updated routes.
    • Legacy routes hardcoded with the /api/prom prefix now respect the -http.prefix flag.
  • [CHANGE] The metrics cortex_distributor_ingester_appends_total and distributor_ingester_append_failures_total now include a type label to differentiate between samples and metadata. #2336
  • [CHANGE] The metrics for number of chunks and bytes flushed to the chunk store are renamed. Note that previous metrics were counted pre-deduplication, while new metrics are counted after deduplication. #2463
    • cortex_ingester_chunks_stored_total > cortex_chunk_store_stored_chunks_total
    • cortex_ingester_chunk_stored_bytes_total > cortex_chunk_store_stored_chunk_bytes_total
  • [CHANGE] Experimental TSDB: renamed blocks meta fetcher metrics: #2375
    • cortex_querier_bucket_store_blocks_meta_syncs_total > cortex_querier_blocks_meta_syncs_total
    • cortex_querier_bucket_store_blocks_meta_sync_failures_total > cortex_querier_blocks_meta_sync_failures_total
    • cortex_querier_bucket_store_blocks_meta_sync_duration_seconds > cortex_querier_blocks_meta_sync_duration_seconds
    • cortex_querier_bucket_store_blocks_meta_sync_consistency_delay_seconds > cortex_querier_blocks_meta_sync_consistency_delay_seconds
  • [CHANGE] Experimental TSDB: Modified default values for compactor.deletion-delay option from 48h to 12h and -experimental.tsdb.bucket-store.ignore-deletion-marks-delay from 24h to 6h. #2414
  • [CHANGE] WAL: Default value of -ingester.checkpoint-enabled changed to true. #2416
  • [CHANGE] trace_id field in log files has been renamed to traceID. #2518
  • [CHANGE] Slow query log has a different output now. Previously used url field has been replaced with host and path, and query parameters are logged as individual log fields with qs_ prefix. #2520
  • [CHANGE] WAL: WAL and checkpoint compression is now disabled. #2436
  • [CHANGE] Update in dependency go-kit/kit from v0.9.0 to v0.10.0. HTML escaping disabled in JSON Logger. #2535
  • [CHANGE] Experimental TSDB: Removed cortex_<service>_ prefix from Thanos objstore metrics and added component label to distinguish which Cortex component is doing API calls to the object storage when running in single-binary mode: #2568
    • cortex_<service>_thanos_objstore_bucket_operations_total renamed to thanos_objstore_bucket_operations_total{component="<name>"}
    • cortex_<service>_thanos_objstore_bucket_operation_failures_total renamed to thanos_objstore_bucket_operation_failures_total{component="<name>"}
    • cortex_<service>_thanos_objstore_bucket_operation_duration_seconds renamed to thanos_objstore_bucket_operation_duration_seconds{component="<name>"}
    • cortex_<service>_thanos_objstore_bucket_last_successful_upload_time renamed to thanos_objstore_bucket_last_successful_upload_time{component="<name>"}
  • [CHANGE] FIFO cache: The -<prefix>.fifocache.size CLI flag has been renamed to -<prefix>.fifocache.max-size-items as well as its YAML config option size renamed to max_size_items. #2319
  • [FEATURE] Ruler: The -ruler.evaluation-delay flag was added to allow users to configure a default evaluation delay for all rules in cortex. The default value is 0 which is the current behavior. #2423
  • [FEATURE] Experimental: Added a new object storage client for OpenStack Swift. #2440
  • [FEATURE] TLS config options added to the Server. #2535
  • [FEATURE] Experimental: Added support for /api/v1/metadata Prometheus-based endpoint. #2549
  • [FEATURE] Add ability to limit concurrent queries to Cassandra with -cassandra.query-concurrency flag. #2562
  • [ENHANCEMENT] Experimental TSDB: sample ingestion errors are now reported via existing cortex_discarded_samples_total metric. #2370
  • [ENHANCEMENT] Failures on samples at distributors and ingesters return the first validation error as opposed to the last. #2383
  • [ENHANCEMENT] Experimental TSDB: Added cortex_querier_blocks_meta_synced, which reflects current state of synced blocks over all tenants. #2392
  • [ENHANCEMENT] Added cortex_distributor_latest_seen_sample_timestamp_seconds metric to see how far behind Prometheus servers are in sending data. #2371
  • [ENHANCEMENT] FIFO cache to support eviction based on memory usage. Added -<prefix>.fifocache.max-size-bytes CLI flag and YAML config option max_size_bytes to specify memory limit of the cache. #2319, #2527
  • [ENHANCEMENT] Added -querier.worker-match-max-concurrent. Force worker concurrency to match the -querier.max-concurrent option. Overrides -querier.worker-parallelism. #2456
  • [ENHANCEMENT] Added the following metrics for monitoring delete requests: #2445
    • cortex_purger_delete_requests_received_total: Number of delete requests received per user.
    • cortex_purger_delete_requests_processed_total: Number of delete requests processed per user.
    • cortex_purger_delete_requests_chunks_selected_total: Number of chunks selected while building delete plans per user.
    • cortex_purger_delete_requests_processing_failures_total: Number of delete requests processing failures per user.
  • [ENHANCEMENT] Single Binary: Added query-frontend to the single binary. Single binary users will now benefit from various query-frontend features. Primarily: sharding, parallelization, load shedding, additional caching (if configured), and query retries. #2437
  • [ENHANCEMENT] Allow 1w (where w denotes week) and 1y (where y denotes year) when setting -store.cache-lookups-older-than and -store.max-look-back-period. #2454
  • [ENHANCEMENT] Optimize index queries for matchers using "a|b|c"-type regex. #2446 #2475
  • [ENHANCEMENT] Added per tenant metrics for queries and chunks and bytes read from chunk store: #2463
    • cortex_chunk_store_fetched_chunks_total and cortex_chunk_store_fetched_chunk_bytes_total
    • cortex_query_frontend_queries_total (per tenant queries counted by the frontend)
  • [ENHANCEMENT] WAL: New metrics cortex_ingester_wal_logged_bytes_total and cortex_ingester_checkpoint_logged_bytes_total added to track total bytes logged to disk for WAL and checkpoints. #2497
  • [ENHANCEMENT] Add de-duplicated chunks counter cortex_chunk_store_deduped_chunks_total which counts every chunk not sent to the store because it was already sent by another replica. #2485
  • [ENHANCEMENT] Query-frontend now also logs the POST data of long queries. #2481
  • [ENHANCEMENT] WAL: Ingester WAL records now have type header and the custom WAL records have been replaced by Prometheus TSDB's WAL records. Old records will not be supported from 1.3 onwards. Note: once this is deployed, you cannot downgrade without data loss. #2436
  • [ENHANCEMENT] Redis Cache: Added idle_timeout, wait_on_pool_exhaustion and max_conn_lifetime options to redis cache configuration. #2550
  • [ENHANCEMENT] WAL: the experimental tag has been removed on the WAL in ingesters. #2560
  • [ENHANCEMENT] Use newer AWS API for paginated queries - removes 'Deprecated' message from logfiles. #2452
  • [BUGFIX] Ruler: Ensure temporary rule files with special characters are properly mapped and cleaned up. #2506
  • [BUGFIX] Ensure requests are properly routed to the prometheus api embedded in the query if -server.path-prefix is set. Fixes #2411. #2372
  • [BUGFIX] Experimental TSDB: Fixed chunk data corruption when querying back series using the experimental blocks storage. #2400
  • [BUGFIX] Cassandra Storage: Fix endpoint TLS host verification. #2109
  • [BUGFIX] Experimental TSDB: Fixed response status code from 422 to 500 when an error occurs while iterating chunks with the experimental blocks storage. #2402
  • [BUGFIX] Ring: Fixed a situation where upgrading from pre-1.0 cortex with a rolling strategy caused new 1.0 ingesters to lose their zone value in the ring until manually forced to re-register. #2404
  • [BUGFIX] Distributor: /all_user_stats now show API and Rule Ingest Rate correctly. #2457
  • [BUGFIX] Fixed version, revision and branch labels exported by the cortex_build_info metric. #2468
  • [BUGFIX] QueryFrontend: fixed a situation where span context missed when downstream_url is used. #2539
  • [BUGFIX] Querier: Fixed a situation where querier would crash because of an unresponsive frontend instance. #2569

1.1.0-rc.0 / 2020-05-13

13 May 12:38
v1.1.0-rc.0
a3ca74d
Compare
Choose a tag to compare
Pre-release

This release brings the usual mix of bugfixes and improvements. The biggest change is that WAL support for chunks is now considered to be production-ready!

Please make sure to review renamed metrics, and update your dashboards and alerts accordingly.

  • [CHANGE] Added v1 API routes documented in #2327. #2372
    • Added -http.alertmanager-http-prefix flag which allows the configuration of the path where the Alertmanager API and UI can be reached. The default is set to /alertmanager.
    • Added -http.prometheus-http-prefix flag which allows the configuration of the path where the Prometheus API and UI can be reached. The default is set to /prometheus.
    • Updated the index hosted at the root prefix to point to the updated routes.
    • Legacy routes hardcoded with the /api/prom prefix now respect the -http.prefix flag.
  • [CHANGE] The metrics cortex_distributor_ingester_appends_total and distributor_ingester_append_failures_total now include a type label to differentiate between samples and metadata. #2336
  • [CHANGE] The metrics for number of chunks and bytes flushed to the chunk store are renamed. Note that previous metrics were counted pre-deduplication, while new metrics are counted after deduplication. #2463
    • cortex_ingester_chunks_stored_total > cortex_chunk_store_stored_chunks_total
    • cortex_ingester_chunk_stored_bytes_total > cortex_chunk_store_stored_chunk_bytes_total
  • [CHANGE] Experimental TSDB: renamed blocks meta fetcher metrics: #2375
    • cortex_querier_bucket_store_blocks_meta_syncs_total > cortex_querier_blocks_meta_syncs_total
    • cortex_querier_bucket_store_blocks_meta_sync_failures_total > cortex_querier_blocks_meta_sync_failures_total
    • cortex_querier_bucket_store_blocks_meta_sync_duration_seconds > cortex_querier_blocks_meta_sync_duration_seconds
    • cortex_querier_bucket_store_blocks_meta_sync_consistency_delay_seconds > cortex_querier_blocks_meta_sync_consistency_delay_seconds
  • [CHANGE] Experimental TSDB: Modified default values for compactor.deletion-delay option from 48h to 12h and -experimental.tsdb.bucket-store.ignore-deletion-marks-delay from 24h to 6h. #2414
  • [CHANGE] Experimental WAL: Default value of -ingester.checkpoint-enabled changed to true. #2416
  • [CHANGE] trace_id field in log files has been renamed to traceID. #2518
  • [CHANGE] Slow query log has a different output now. Previously used url field has been replaced with host and path, and query parameters are logged as individual log fields with qs_ prefix. #2520
  • [CHANGE] Experimental WAL: WAL and checkpoint compression is now disabled. #2436
  • [CHANGE] Update in dependency go-kit/kit from v0.9.0 to v0.10.0. HTML escaping disabled in JSON Logger. #2535
  • [CHANGE] Experimental TSDB: Removed cortex_<service>_ prefix from Thanos objstore metrics and added component label to distinguish which Cortex component is doing API calls to the object storage when running in single-binary mode: #2568
    • cortex_<service>_thanos_objstore_bucket_operations_total renamed to thanos_objstore_bucket_operations_total{component="<name>"}
    • cortex_<service>_thanos_objstore_bucket_operation_failures_total renamed to thanos_objstore_bucket_operation_failures_total{component="<name>"}
    • cortex_<service>_thanos_objstore_bucket_operation_duration_seconds renamed to thanos_objstore_bucket_operation_duration_seconds{component="<name>"}
    • cortex_<service>_thanos_objstore_bucket_last_successful_upload_time renamed to thanos_objstore_bucket_last_successful_upload_time{component="<name>"}
  • [CHANGE] FIFO cache: The -<prefix>.fifocache.size CLI flag has been renamed to -<prefix>.fifocache.max-size-items as well as its YAML config option size renamed to max_size_items. #2319
  • [FEATURE] Ruler: The -ruler.evaluation-delay flag was added to allow users to configure a default evaluation delay for all rules in cortex. The default value is 0 which is the current behavior. #2423
  • [FEATURE] Experimental: Added a new object storage client for OpenStack Swift. #2440
  • [FEATURE] TLS config options added to the Server. #2535
  • [FEATURE] Experimental: Added support for /api/v1/metadata Prometheus-based endpoint. #2549
  • [FEATURE] Add ability to limit concurrent queries to Cassandra with -cassandra.query-concurrency flag. #2562
  • [ENHANCEMENT] Experimental TSDB: sample ingestion errors are now reported via existing cortex_discarded_samples_total metric. #2370
  • [ENHANCEMENT] Failures on samples at distributors and ingesters return the first validation error as opposed to the last. #2383
  • [ENHANCEMENT] Experimental TSDB: Added cortex_querier_blocks_meta_synced, which reflects current state of synced blocks over all tenants. #2392
  • [ENHANCEMENT] Added cortex_distributor_latest_seen_sample_timestamp_seconds metric to see how far behind Prometheus servers are in sending data. #2371
  • [ENHANCEMENT] FIFO cache to support eviction based on memory usage. Added -<prefix>.fifocache.max-size-bytes CLI flag and YAML config option max_size_bytes to specify memory limit of the cache. #2319, #2527
  • [ENHANCEMENT] Added -querier.worker-match-max-concurrent. Force worker concurrency to match the -querier.max-concurrent option. Overrides -querier.worker-parallelism. #2456
  • [ENHANCEMENT] Added the following metrics for monitoring delete requests: #2445
    • cortex_purger_delete_requests_received_total: Number of delete requests received per user.
    • cortex_purger_delete_requests_processed_total: Number of delete requests processed per user.
    • cortex_purger_delete_requests_chunks_selected_total: Number of chunks selected while building delete plans per user.
    • cortex_purger_delete_requests_processing_failures_total: Number of delete requests processing failures per user.
  • [ENHANCEMENT] Single Binary: Added query-frontend to the single binary. Single binary users will now benefit from various query-frontend features. Primarily: sharding, parallelization, load shedding, additional caching (if configured), and query retries. #2437
  • [ENHANCEMENT] Allow 1w (where w denotes week) and 1y (where y denotes year) when setting -store.cache-lookups-older-than and -store.max-look-back-period. #2454
  • [ENHANCEMENT] Optimize index queries for matchers using "a|b|c"-type regex. #2446 #2475
  • [ENHANCEMENT] Added per tenant metrics for queries and chunks and bytes read from chunk store: #2463
    • cortex_chunk_store_fetched_chunks_total and cortex_chunk_store_fetched_chunk_bytes_total
    • cortex_query_frontend_queries_total (per tenant queries counted by the frontend)
  • [ENHANCEMENT] WAL: New metrics cortex_ingester_wal_logged_bytes_total and cortex_ingester_checkpoint_logged_bytes_total added to track total bytes logged to disk for WAL and checkpoints. #2497
  • [ENHANCEMENT] Add de-duplicated chunks counter cortex_chunk_store_deduped_chunks_total which counts every chunk not sent to the store because it was already sent by another replica. #2485
  • [ENHANCEMENT] Query-frontend now also logs the POST data of long queries. #2481
  • [ENHANCEMENT] WAL: Ingester WAL records now have type header and the custom WAL records have been replaced by Prometheus TSDB's WAL records. Old records will not be supported from 1.3 onwards. Note: once this is deployed, you cannot downgrade without data loss. #2436
  • [ENHANCEMENT] Redis Cache: Added idle_timeout, wait_on_pool_exhaustion and max_conn_lifetime options to redis cache configuration. #2550
  • [ENHANCEMENT] WAL: the experimental tag has been removed on the WAL in ingesters. #2560
  • [ENHANCEMENT] Use newer AWS API for paginated queries - removes 'Deprecated' message from logfiles. #2452
  • [BUGFIX] Ruler: Ensure temporary rule files with special characters are properly mapped and cleaned up. #2506
  • [BUGFIX] Ensure requests are properly routed to the prometheus api embedded in the query if -server.path-prefix is set. Fixes #2411. #2372
  • [BUGFIX] Experimental TSDB: Fixed chunk data corruption when querying back series using the experimental blocks storage. #2400
  • [BUGFIX] Cassandra Storage: Fix endpoint TLS host verification. #2109
  • [BUGFIX] Experimental TSDB: Fixed response status code from 422 to 500 when an error occurs while iterating chunks with the experimental blocks storage. #2402
  • [BUGFIX] Ring: Fixed a situation where upgrading from pre-1.0 cortex with a rolling strategy caused new 1.0 ingesters to lose their zone value in the ring until manually forced to re-register. #2404
  • [BUGFIX] Distributor: /all_user_stats now show API and Rule Ingest Rate correctly. #2457
  • [BUGFIX] Fixed version, revision and branch labels exported by the cortex_build_info metric. #2468
  • [BUGFIX] QueryFrontend: fixed a situation where HTTP error is ignored and an incorrect status code is set. #2483
  • [BUGFIX] QueryFrontend: fixed a situation where span context missed when downstream_url is used. #2539
  • [BUGFIX] Querier: Fixed a situation where querier would crash because of an unresponsive frontend instance. #2569

1.0.1 / 2020-04-23

23 Apr 14:30
v1.0.1
6d72700
Compare
Choose a tag to compare

In a cluster with 3 ingester replicas, when rollouts happen or when there are only 2 replicas available, you might see gaps in your queries. This release fixes that bug.

  • [BUGFIX] Fix gaps when querying ingesters with replication factor = 3 and 2 ingesters in the cluster. #2503

1.0.0 / 2020-04-02

02 Apr 12:04
v1.0.0
b8efa9a
Compare
Choose a tag to compare

This is the first major release of Cortex. We made a lot of breaking changes in this release which have been detailed below. Please also see the stability guarantees we provide as part of a major release: https://cortexmetrics.io/docs/configuration/v1guarantees/.

  • [CHANGE] Remove the following deprecated flags: #2339

    • -metrics.error-rate-query (use -metrics.write-throttle-query instead).
    • -store.cardinality-cache-size (use -store.index-cache-read.enable-fifocache and -store.index-cache-read.fifocache.size instead).
    • -store.cardinality-cache-validity (use -store.index-cache-read.enable-fifocache and -store.index-cache-read.fifocache.duration instead).
    • -distributor.limiter-reload-period (flag unused)
    • -ingester.claim-on-rollout (flag unused)
    • -ingester.normalise-tokens (flag unused)
  • [CHANGE] Renamed YAML file options to be more consistent. See full config file changes below. #2273

  • [CHANGE] AWS based autoscaling has been removed. You can only use metrics based autoscaling now. -applicationautoscaling.url has been removed. See https://cortexmetrics.io/docs/guides/aws/#dynamodb-capacity-provisioning on how to migrate. #2328

  • [CHANGE] Renamed the memcache.write-back-goroutines and memcache.write-back-buffer flags to background.write-back-concurrency and background.write-back-buffer. This affects the following flags: #2241

    • -frontend.memcache.write-back-buffer --> -frontend.background.write-back-buffer
    • -frontend.memcache.write-back-goroutines --> -frontend.background.write-back-concurrency
    • -store.index-cache-read.memcache.write-back-buffer --> -store.index-cache-read.background.write-back-buffer
    • -store.index-cache-read.memcache.write-back-goroutines --> -store.index-cache-read.background.write-back-concurrency
    • -store.index-cache-write.memcache.write-back-buffer --> -store.index-cache-write.background.write-back-buffer
    • -store.index-cache-write.memcache.write-back-goroutines --> -store.index-cache-write.background.write-back-concurrency
    • -memcache.write-back-buffer --> -store.chunks-cache.background.write-back-buffer. Note the next change log for the difference.
    • -memcache.write-back-goroutines --> -store.chunks-cache.background.write-back-concurrency. Note the next change log for the difference.
  • [CHANGE] Renamed the chunk cache flags to have store.chunks-cache. as prefix. This means the following flags have been changed: #2241

    • -cache.enable-fifocache --> -store.chunks-cache.cache.enable-fifocache
    • -default-validity --> -store.chunks-cache.default-validity
    • -fifocache.duration --> -store.chunks-cache.fifocache.duration
    • -fifocache.size --> -store.chunks-cache.fifocache.size
    • -memcache.write-back-buffer --> -store.chunks-cache.background.write-back-buffer. Note the previous change log for the difference.
    • -memcache.write-back-goroutines --> -store.chunks-cache.background.write-back-concurrency. Note the previous change log for the difference.
    • -memcached.batchsize --> -store.chunks-cache.memcached.batchsize
    • -memcached.consistent-hash --> -store.chunks-cache.memcached.consistent-hash
    • -memcached.expiration --> -store.chunks-cache.memcached.expiration
    • -memcached.hostname --> -store.chunks-cache.memcached.hostname
    • -memcached.max-idle-conns --> -store.chunks-cache.memcached.max-idle-conns
    • -memcached.parallelism --> -store.chunks-cache.memcached.parallelism
    • -memcached.service --> -store.chunks-cache.memcached.service
    • -memcached.timeout --> -store.chunks-cache.memcached.timeout
    • -memcached.update-interval --> -store.chunks-cache.memcached.update-interval
    • -redis.enable-tls --> -store.chunks-cache.redis.enable-tls
    • -redis.endpoint --> -store.chunks-cache.redis.endpoint
    • -redis.expiration --> -store.chunks-cache.redis.expiration
    • -redis.max-active-conns --> -store.chunks-cache.redis.max-active-conns
    • -redis.max-idle-conns --> -store.chunks-cache.redis.max-idle-conns
    • -redis.password --> -store.chunks-cache.redis.password
    • -redis.timeout --> -store.chunks-cache.redis.timeout
  • [CHANGE] Rename the -store.chunk-cache-stubs to -store.chunks-cache.cache-stubs to be more inline with above. #2241

  • [CHANGE] Change prefix of flags -dynamodb.periodic-table.* to -table-manager.index-table.*. #2359

  • [CHANGE] Change prefix of flags -dynamodb.chunk-table.* to -table-manager.chunk-table.*. #2359

  • [CHANGE] Change the following flags: #2359

    • -dynamodb.poll-interval --> -table-manager.poll-interval
    • -dynamodb.periodic-table.grace-period --> -table-manager.periodic-table.grace-period
  • [CHANGE] Renamed the following flags: #2273

    • -dynamodb.chunk.gang.size --> -dynamodb.chunk-gang-size
    • -dynamodb.chunk.get.max.parallelism --> -dynamodb.chunk-get-max-parallelism
  • [CHANGE] Don't support mixed time units anymore for duration. For example, 168h5m0s doesn't work anymore, please use just one unit (s|m|h|d|w|y). #2252

  • [CHANGE] Utilize separate protos for rule state and storage. Experimental ruler API will not be functional until the rollout is complete. #2226

  • [CHANGE] Frontend worker in querier now starts after all Querier module dependencies are started. This fixes issue where frontend worker started to send queries to querier before it was ready to serve them (mostly visible when using experimental blocks storage). #2246

  • [CHANGE] Lifecycler component now enters Failed state on errors, and doesn't exit the process. (Important if you're vendoring Cortex and use Lifecycler) #2251

  • [CHANGE] /ready handler now returns 200 instead of 204. #2330

  • [CHANGE] Better defaults for the following options: #2344

    • -<prefix>.consul.consistent-reads: Old default: true, new default: false. This reduces the load on Consul.
    • -<prefix>.consul.watch-rate-limit: Old default: 0, new default: 1. This rate limits the reads to 1 per second. Which is good enough for ring watches.
    • -distributor.health-check-ingesters: Old default: false, new default: true.
    • -ingester.max-stale-chunk-idle: Old default: 0, new default: 2m. This lets us expire series that we know are stale early.
    • -ingester.spread-flushes: Old default: false, new default: true. This allows to better de-duplicate data and use less space.
    • -ingester.chunk-age-jitter: Old default: 20mins, new default: 0. This is to enable the -ingester.spread-flushes to true.
    • -<prefix>.memcached.batchsize: Old default: 0, new default: 1024. This allows batching of requests and keeps the concurrent requests low.
    • -<prefix>.memcached.consistent-hash: Old default: false, new default: true. This allows for better cache hits when the memcaches are scaled up and down.
    • -querier.batch-iterators: Old default: false, new default: true.
    • -querier.ingester-streaming: Old default: false, new default: true.
  • [CHANGE] Experimental TSDB: Added -experimental.tsdb.bucket-store.postings-cache-compression-enabled to enable postings compression when storing to cache. #2335

  • [CHANGE] Experimental TSDB: Added -compactor.deletion-delay, which is time before a block marked for deletion is deleted from bucket. If not 0, blocks will be marked for deletion and compactor component will delete blocks marked for deletion from the bucket. If delete-delay is 0, blocks will be deleted straight away. Note that deleting blocks immediately can cause query failures, if store gateway / querier still has the block loaded, or compactor is ignoring the deletion because it's compacting the block at the same time. Default value is 48h. #2335

  • [CHANGE] Experimental TSDB: Added -experimental.tsdb.bucket-store.index-cache.postings-compression-enabled, to set duration after which the blocks marked for deletion will be filtered out while fetching blocks used for querying. This option allows querier to ignore blocks that are marked for deletion with some delay. This ensures store can still serve blocks that are meant to be deleted but do not have a replacement yet. Default is 24h, half of the default value for -compactor.deletion-delay. #2335

  • [CHANGE] Experimental TSDB: Added -experimental.tsdb.bucket-store.index-cache.memcached.max-item-size to control maximum size of item that is stored to memcached. Defaults to 1 MiB. #2335

  • [FEATURE] Added experimental storage API to the ruler service that is enabled when the -experimental.ruler.enable-api is set to true #2269

    • -ruler.storage.type flag now allows s3,gcs, and azure values
    • -ruler.storage.(s3|gcs|azure) flags exist to allow the configuration of object clients set for rule storage
  • [CHANGE] Renamed table manager metrics. #2307 #2359

    • cortex_dynamo_sync_tables_seconds -> cortex_table_manager_sync_duration_seconds
    • cortex_dynamo_table_capacity_units -> cortex_table_capacity_units
  • [FEATURE] Flusher target to flush the WAL. #2075

    • -flusher.wal-dir for the WAL directory to recover from.
    • -flusher.concurrent-flushes for number of concurrent flushes.
    • -flusher.flush-op-timeout is duration after which a flush should timeout.
  • [FEATURE] Ingesters can now have an optional availability zone set, to ensure metric replication is distributed across zones. This is set via the -ingester.availability-zone flag or the availability_zone field in the config file. #2317

  • [ENHANCEMENT] Better re-use of connections to DynamoDB and S3. #2268

  • [ENHANCEMENT] Experimental TSDB: Add support for local filesystem backend. #2245

  • [ENHANCEMENT] Experimental TSDB: Added memcached support for the TSDB index cache. #2290

  • [ENHANCEMENT] Experimental TSDB: Removed gRPC server to communicate between querier and BucketStore. #2324

  • [ENHANCEMENT] Allow 1w (where w denotes week) ...

Read more