Skip to content

Commit 87536d8

Browse files
author
allenzhli
committed
Signed-off-by: allenzhli <[email protected]>
2 parents ea7efcd + 09643dc commit 87536d8

40 files changed

+1909
-153
lines changed

ADOPTERS.md

+1
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
This is the list of organisations that are using Cortex in **production environments** to power their metrics and monitoring systems. Please send PRs to add or remove organisations.
44

5+
* [Amazon Web Services (AWS)](https://aws.amazon.com/prometheus)
56
* [Aspen Mesh](https://aspenmesh.io/)
67
* [Buoyant](https://buoyant.io/)
78
* [DigitalOcean](https://www.digitalocean.com/)

CHANGELOG.md

+8-1
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,24 @@
66
* [CHANGE] Blocks storage: compactor is now required when running a Cortex cluster with the blocks storage, because it also keeps the bucket index updated. #3583
77
* [CHANGE] Blocks storage: block deletion marks are now stored in a per-tenant global markers/ location too, other than within the block location. The compactor, at startup, will copy deletion marks from the block location to the global location. This migration is required only once, so you can safely disable it via `-compactor.block-deletion-marks-migration-enabled=false` once new compactor has successfully started once in your cluster. #3583
88
* [ENHANCEMENT] Blocks storage: introduced a per-tenant bucket index, periodically updated by the compactor, used to avoid full bucket scanning done by queriers and store-gateways. The bucket index is updated by the compactor during blocks cleanup, on every `-compactor.cleanup-interval`. #3553 #3555 #3561 #3583
9+
* [ENHANCEMENT] Blocks storage: introduced an option `-blocks-storage.bucket-store.bucket-index.enabled` to enable the usage of the bucket index in the querier. When enabled, the querier will use the bucket index to find a tenant's blocks instead of running the periodic bucket scan. The following new metrics have been added: #3614
10+
* `cortex_bucket_index_loads_total`
11+
* `cortex_bucket_index_load_failures_total`
12+
* `cortex_bucket_index_load_duration_seconds`
13+
* `cortex_bucket_index_loaded`
914
* [ENHANCEMENT] Compactor: exported the following metrics. #3583
1015
* `cortex_bucket_blocks_count`: Total number of blocks per tenant in the bucket. Includes blocks marked for deletion.
1116
* `cortex_bucket_blocks_marked_for_deletion_count`: Total number of blocks per tenant marked for deletion in the bucket.
1217
* `cortex_bucket_index_last_successful_update_timestamp_seconds`: Timestamp of the last successful update of a tenant's bucket index.
1318
* [ENHANCEMENT] Ruler: Add `cortex_prometheus_last_evaluation_samples` to expose the number of samples generated by a rule group per tenant. #3582
1419
* [ENHANCEMENT] Memberlist: add status page (/memberlist) with available details about memberlist-based KV store and memberlist cluster. It's also possible to view KV values in Go struct or JSON format, or download for inspection. #3575
15-
* [ENHANCEMENT] Memberlist: client can now keep a size-bounded buffer with sent and received messages and display them in the admin UI (/memberlist) for troubleshooting. #3581
20+
* [ENHANCEMENT] Memberlist: client can now keep a size-bounded buffer with sent and received messages and display them in the admin UI (/memberlist) for troubleshooting. #3581 #3602
21+
* [BUGFIX] Allow `-querier.max-query-lookback` use `y|w|d` suffix like deprecated `-store.max-look-back-period`. #3598
1622
* [BUGFIX] Query-Frontend: `cortex_query_seconds_total` now return seconds not nanoseconds. #3589
1723
* [ENHANCEMENT] Add api to list all tenant alertmanager configs and ruler rules. #3259
1824
- `GET /multitenant_alertmanager/configs`
1925
- `GET /ruler/rules`
26+
* [BUGFIX] Memberlist: Entry in the ring should now not appear again after using "Forget" feature (unless it's still heartbeating). #3603
2027

2128
## 1.6.0-rc.0 in progress
2229

docs/blocks-storage/_index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ When running the Cortex blocks storage, the Cortex architecture doesn't signific
2929

3030
The **[store-gateway](./store-gateway.md)** is responsible to query blocks and is used by the [querier](./querier.md) at query time. The store-gateway is required when running the blocks storage.
3131

32-
The **[compactor](./compactor.md)** is responsible to merge and deduplicate smaller blocks into larger ones, in order to reduce the number of blocks stored in the long-term storage for a given tenant and query them more efficiently. It also keeps the bucket index updated and, for this reason, it's a required component.
32+
The **[compactor](./compactor.md)** is responsible to merge and deduplicate smaller blocks into larger ones, in order to reduce the number of blocks stored in the long-term storage for a given tenant and query them more efficiently. It also keeps the [bucket index](./bucket-index.md) updated and, for this reason, it's a required component.
3333

3434
Finally, the [**table-manager**](../chunks-storage/table-manager.md) and the [**schema config**](../chunks-storage/schema-config.md) are **not used** by the blocks storage.
3535

docs/blocks-storage/bucket-index.md

+57
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
---
2+
title: "Bucket Index"
3+
linkTitle: "Bucket Index"
4+
weight: 5
5+
slug: bucket-index
6+
---
7+
8+
The bucket index is a **per-tenant file containing the list of blocks and block deletion marks** in the storage. The bucket index itself is stored in the backend object storage, is periodically updated by the compactor and used by queriers to discover blocks in the storage.
9+
10+
The bucket index usage is **optional** and can be enabled via `-blocks-storage.bucket-store.bucket-index.enabled=true` (or its respective YAML config option).
11+
12+
## Benefits
13+
14+
The [querier](./querier.md) needs to have an almost up-to-date view over the entire storage bucket, in order to find the right blocks to lookup at query time. Because of this, querier needs to periodically scan the bucket to look for new blocks uploaded by ingester or compactor, and blocks deleted (or marked for deletion) by compactor.
15+
16+
When this bucket index is enabled, the querier periodically look up the per-tenant bucket index instead of scanning the bucket via "list objects" operations. This brings few benefits:
17+
18+
1. Reduced number of API calls to the object storage by querier
19+
2. No "list objects" storage API calls done by querier
20+
3. The [querier](./querier.md) is up and running immediately after the startup (no need to run an initial bucket scan)
21+
22+
## Structure of the index
23+
24+
The `bucket-index.json.gz` contains:
25+
26+
- **`blocks`**<br />
27+
List of complete blocks of a tenant, including blocks marked for deletion (partial blocks are excluded from the index).
28+
- **`block_deletion_marks`**<br />
29+
List of block deletion marks.
30+
- **`updated_at`**<br />
31+
Unix timestamp (seconds precision) of when the index has been updated (written in the storage) the last time.
32+
33+
## How it gets updated
34+
35+
The [compactor](./compactor.md) periodically scans the bucket and uploads an updated bucket index to the storage. The frequency at which the bucket index is updated can be configured via `-compactor.cleanup-interval`.
36+
37+
Despite using the bucket index is optional, the index itself is built and updated by the compactor even if `-blocks-storage.bucket-store.bucket-index.enabled` has **not** been enabled. This is intentional, so that once a Cortex cluster operator decides to enable the bucket index in a live cluster, the bucket index for any tenant is already existing and query results consistency is guaranteed. The overhead introduced by keeping the bucket index updated is expected to be non significative.
38+
39+
## How it's used by the querier
40+
41+
The [querier](./querier.md), at query time, checks whether the bucket index for the tenant has already been loaded in memory. If not, the querier downloads it from the storage and cache it in memory.
42+
43+
_Given it's a small file, lazy downloading it doesn't significantly impact on first query performances, but allows to get a querier up and running without pre-downloading every tenant's bucket index. Moreover, if the [metadata cache](./querier.md#metadata-cache) is enabled, the bucket index will be cached for a short time in a shared cache, reducing the actual latency and number of API calls to the object storage in case multiple queriers will fetch the same tenant's bucket index in a short time._
44+
45+
![Querier - Bucket index](/images/blocks-storage/bucket-index-querier-logic.png)
46+
<!-- Diagram source at https://docs.google.com/presentation/d/1bHp8_zcoWCYoNU2AhO2lSagQyuIrghkCncViSqn14cU/edit -->
47+
48+
While in-memory, a background process will keep it **updated at periodic intervals**, so that subsequent queries from the same tenant to the same querier instance will use the cached (and periodically updated) bucket index. There are two config options involved:
49+
50+
- `-blocks-storage.bucket-store.bucket-index.update-on-stale-interval`<br />
51+
This option configures how frequently a cached bucket index should be refreshed.
52+
- `-blocks-storage.bucket-store.bucket-index.update-on-error-interval`<br />
53+
If downloading a bucket index fails, the failure is cached for a short time in order to avoid hammering the backend storage. This option configures how frequently a bucket index, which previously failed to load, should be tried to load again.
54+
55+
If a bucket index is unused for a long time (configurable via `-blocks-storage.bucket-store.bucket-index.idle-timeout`), e.g. because that querier instance is not receiving any query from the tenant, the querier will offload it, stopping to keep it updated at regular intervals. This is particularly for tenants which are resharded to different queriers when [shuffle sharding](../guides/shuffle-sharding.md) is enabled.
56+
57+
Finally, the querier, at query time, checks how old is a bucket index (based on its `updated_at`) and fail a query if its age is older than `-blocks-storage.bucket-store.bucket-index.max-stale-period`. This circuit breaker is used to ensure queriers will not return any partial query results due to a stale view over the long-term storage.

docs/blocks-storage/compactor.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ slug: compactor
1010
The **compactor** is an service which is responsible to:
1111

1212
- Compact multiple blocks of a given tenant into a single optimized larger block. This helps to reduce storage costs (deduplication, index size reduction), and increase query speed (querying fewer blocks is faster).
13-
- Keep the per-tenant bucket index updated. The bucket index is used by [queriers](./querier.md) and [store-gateways](./store-gateway.md) to discover new blocks in the storage.
13+
- Keep the per-tenant bucket index updated. The [bucket index](./bucket-index.md) is used by [queriers](./querier.md) to discover new blocks in the storage.
1414

1515
The compactor is **stateless**.
1616

docs/blocks-storage/compactor.template

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ slug: compactor
1010
The **compactor** is an service which is responsible to:
1111

1212
- Compact multiple blocks of a given tenant into a single optimized larger block. This helps to reduce storage costs (deduplication, index size reduction), and increase query speed (querying fewer blocks is faster).
13-
- Keep the per-tenant bucket index updated. The bucket index is used by [queriers](./querier.md) and [store-gateways](./store-gateway.md) to discover new blocks in the storage.
13+
- Keep the per-tenant bucket index updated. The [bucket index](./bucket-index.md) is used by [queriers](./querier.md) to discover new blocks in the storage.
1414

1515
The compactor is **stateless**.
1616

docs/blocks-storage/querier.md

+64-4
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,28 @@ The querier is **stateless**.
1313

1414
## How it works
1515

16-
At startup **queriers** iterate over the entire storage bucket to discover all tenants blocks and download the `meta.json` for each block. During this initial bucket scanning phase, a querier is not ready to handle incoming queries yet and its `/ready` readiness probe endpoint will fail.
16+
The querier needs to have an almost up-to-date view over the entire storage bucket, in order to find the right blocks to lookup at query time. The querier can keep the bucket view updated in to two different ways:
17+
18+
1. Periodically scanning the bucket (default)
19+
2. Periodically downloading the [bucket index](./bucket-index.md)
20+
21+
### Bucket index disabled (default)
22+
23+
At startup, **queriers** iterate over the entire storage bucket to discover all tenants blocks and download the `meta.json` for each block. During this initial bucket scanning phase, a querier is not ready to handle incoming queries yet and its `/ready` readiness probe endpoint will fail.
1724

1825
While running, queriers periodically iterate over the storage bucket to discover new tenants and recently uploaded blocks. Queriers do **not** download any content from blocks except a small `meta.json` file containing the block's metadata (including the minimum and maximum timestamp of samples within the block).
1926

2027
Queriers use the metadata to compute the list of blocks that need to be queried at query time and fetch matching series from the [store-gateway](./store-gateway.md) instances holding the required blocks.
2128

29+
### Bucket index enabled
30+
31+
When [bucket index](./bucket-index.md) is enabled, queriers lazily download the bucket index upon the first query received for a given tenant, cache it in memory and periodically keep it update. The bucket index contains the list of blocks and block deletion marks of a tenant, which is later used during the query execution to find the set of blocks that need to be queried for the given query.
32+
33+
Given the bucket index removes the need to scan the bucket, it brings few benefits:
34+
35+
1. The querier is expected to be ready shortly after startup.
36+
2. Lower volume of API calls to object storage.
37+
2238
### Anatomy of a query request
2339

2440
When a querier receives a query range request, it contains the following parameters:
@@ -60,6 +76,7 @@ Caching is optional, but **highly recommended** in a production environment. Ple
6076
- List of blocks per tenant
6177
- Block's `meta.json` content
6278
- Block's `deletion-mark.json` existence and content
79+
- Tenant's `bucket-index.json.gz` content
6380

6481
Using the metadata cache can significantly reduce the number of API calls to object storage and protects from linearly scale the number of these API calls with the number of querier and store-gateway instances (because the bucket is periodically scanned and synched by each querier and store-gateway).
6582

@@ -341,8 +358,8 @@ blocks_storage:
341358
# CLI flag: -blocks-storage.filesystem.dir
342359
[dir: <string> | default = ""]
343360
344-
# This configures how the store-gateway synchronizes blocks stored in the
345-
# bucket.
361+
# This configures how the querier and store-gateway discover and synchronize
362+
# blocks stored in the bucket.
346363
bucket_store:
347364
# Directory to store synchronized TSDB index headers.
348365
# CLI flag: -blocks-storage.bucket-store.sync-dir
@@ -579,14 +596,30 @@ blocks_storage:
579596
# CLI flag: -blocks-storage.bucket-store.metadata-cache.metafile-content-ttl
580597
[metafile_content_ttl: <duration> | default = 24h]
581598
582-
# Maximum size of metafile content to cache in bytes.
599+
# Maximum size of metafile content to cache in bytes. Caching will be
600+
# skipped if the content exceeds this size. This is useful to avoid
601+
# network round trip for large content if the configured caching backend
602+
# has an hard limit on cached items size (in this case, you should set
603+
# this limit to the same limit in the caching backend).
583604
# CLI flag: -blocks-storage.bucket-store.metadata-cache.metafile-max-size-bytes
584605
[metafile_max_size_bytes: <int> | default = 1048576]
585606
586607
# How long to cache attributes of the block metafile.
587608
# CLI flag: -blocks-storage.bucket-store.metadata-cache.metafile-attributes-ttl
588609
[metafile_attributes_ttl: <duration> | default = 168h]
589610
611+
# How long to cache content of the bucket index.
612+
# CLI flag: -blocks-storage.bucket-store.metadata-cache.bucket-index-content-ttl
613+
[bucket_index_content_ttl: <duration> | default = 5m]
614+
615+
# Maximum size of bucket index content to cache in bytes. Caching will be
616+
# skipped if the content exceeds this size. This is useful to avoid
617+
# network round trip for large content if the configured caching backend
618+
# has an hard limit on cached items size (in this case, you should set
619+
# this limit to the same limit in the caching backend).
620+
# CLI flag: -blocks-storage.bucket-store.metadata-cache.bucket-index-max-size-bytes
621+
[bucket_index_max_size_bytes: <int> | default = 1048576]
622+
590623
# Duration after which the blocks marked for deletion will be filtered out
591624
# while fetching blocks. The idea of ignore-deletion-marks-delay is to
592625
# ignore blocks that are marked for deletion with some delay. This ensures
@@ -596,6 +629,33 @@ blocks_storage:
596629
# CLI flag: -blocks-storage.bucket-store.ignore-deletion-marks-delay
597630
[ignore_deletion_mark_delay: <duration> | default = 6h]
598631
632+
bucket_index:
633+
# True to enable querier to discover blocks in the storage via bucket
634+
# index instead of bucket scanning.
635+
# CLI flag: -blocks-storage.bucket-store.bucket-index.enabled
636+
[enabled: <boolean> | default = false]
637+
638+
# How frequently a cached bucket index should be refreshed.
639+
# CLI flag: -blocks-storage.bucket-store.bucket-index.update-on-stale-interval
640+
[update_on_stale_interval: <duration> | default = 15m]
641+
642+
# How frequently a bucket index, which previously failed to load, should
643+
# be tried to load again.
644+
# CLI flag: -blocks-storage.bucket-store.bucket-index.update-on-error-interval
645+
[update_on_error_interval: <duration> | default = 1m]
646+
647+
# How long a unused bucket index should be cached. Once this timeout
648+
# expires, the unused bucket index is removed from the in-memory cache.
649+
# CLI flag: -blocks-storage.bucket-store.bucket-index.idle-timeout
650+
[idle_timeout: <duration> | default = 1h]
651+
652+
# The maximum allowed age of a bucket index (last updated) before queries
653+
# start failing because the bucket index is too old. The bucket index is
654+
# periodically updated by the compactor, while this check is enforced in
655+
# the querier (at query time).
656+
# CLI flag: -blocks-storage.bucket-store.bucket-index.max-stale-period
657+
[max_stale_period: <duration> | default = 1h]
658+
599659
tsdb:
600660
# Local directory to store TSDBs in the ingesters.
601661
# CLI flag: -blocks-storage.tsdb.dir

0 commit comments

Comments
 (0)