From 2b41af87c2a3306d8db74f373f5d6f5e5276280b Mon Sep 17 00:00:00 2001 From: Martin Chodur Date: Tue, 30 Apr 2019 19:50:42 +0200 Subject: [PATCH 1/6] feat: added thanos check rules command Signed-off-by: Martin Chodur --- CHANGELOG.md | 46 +++---- cmd/thanos/check.go | 119 ++++++++++++++++++ cmd/thanos/check_test.go | 29 +++++ cmd/thanos/main.go | 1 + .../rules-files/invalid-rules-data.yaml | 12 ++ .../rules-files/invalid-yaml-format.yaml | 3 + cmd/thanos/testdata/rules-files/valid.yaml | 20 +++ docs/components/check.md | 82 ++++++++++++ scripts/genflagdocs.sh | 7 +- 9 files changed, 296 insertions(+), 23 deletions(-) create mode 100644 cmd/thanos/check.go create mode 100644 cmd/thanos/check_test.go create mode 100644 cmd/thanos/testdata/rules-files/invalid-rules-data.yaml create mode 100644 cmd/thanos/testdata/rules-files/invalid-yaml-format.yaml create mode 100644 cmd/thanos/testdata/rules-files/valid.yaml create mode 100644 docs/components/check.md diff --git a/CHANGELOG.md b/CHANGELOG.md index 96a0d652ff..64f5c62e95 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -29,6 +29,8 @@ This version moved tarballs to Golang 1.12.5 from 1.11 as well, so same warning ### Added - [#1094](https://github.com/improbable-eng/thanos/pull/1094) Allow configuring the response header timeout for the S3 client. +- [#1097](https://github.com/improbable-eng/thanos/pull/1097) Added `thanos check rules` linter for Thanos rule rules files. + ### Changed @@ -105,28 +107,28 @@ Using cadvisor `container_memory_usage_bytes` metric could be misleading e.g: ht New options: New Store flags: - + * `--store.grpc.series-sample-limit` limits the amount of samples that might be retrieved on a single Series() call. By default it is 0. Consider enabling it by setting it to more than 0 if you are running on limited resources. * `--store.grpc.series-max-concurrency` limits the number of concurrent Series() calls in Thanos Store. By default it is 20. Considering making it lower or bigger depending on the scale of your deployment. New Store metrics: - + * `thanos_bucket_store_queries_dropped_total` shows how many queries were dropped due to the samples limit; * `thanos_bucket_store_queries_concurrent_max` is a constant metric which shows how many Series() calls can concurrently be executed by Thanos Store; * `thanos_bucket_store_queries_in_flight` shows how many queries are currently "in flight" i.e. they are being executed; * `thanos_bucket_store_gate_duration_seconds` shows how many seconds it took for queries to pass through the gate in both cases - when that fails and when it does not. - + New Store tracing span: * `store_query_gate_ismyturn` shows how long it took for a query to pass (or not) through the gate. - -- [#1016](https://github.com/improbable-eng/thanos/pull/1016) Added option for another DNS resolver (miekg/dns client). + +- [#1016](https://github.com/improbable-eng/thanos/pull/1016) Added option for another DNS resolver (miekg/dns client). Note that this is required to have SRV resolution working on [Golang 1.11+ with KubeDNS below v1.14](https://github.com/golang/go/issues/27546) New Querier and Ruler flag: `-- store.sd-dns-resolver` which allows to specify resolver to use. Either `golang` or `miekgdns` - + - [#986](https://github.com/improbable-eng/thanos/pull/986) Allow to save some startup & sync time in store gateway as it is no longer needed to compute index-cache from block index on its own for larger blocks. - The store Gateway still can do it, but it first checks bucket if there is index-cached uploaded already. - In the same time, compactor precomputes the index cache file on every compaction. + The store Gateway still can do it, but it first checks bucket if there is index-cached uploaded already. + In the same time, compactor precomputes the index cache file on every compaction. New Compactor flag: `--index.generate-missing-cache-file` was added to allow quicker addition of index cache files. If enabled it precomputes missing files on compactor startup. Note that it will take time and it's only one-off step per bucket. @@ -143,31 +145,31 @@ Note that this is required to have SRV resolution working on [Golang 1.11+ with - [#1021](https://github.com/improbable-eng/thanos/pull/1021) Query API `series` now supports POST method. - [#939](https://github.com/improbable-eng/thanos/pull/939) Query API `query_range` now supports POST method. -### Changed +### Changed - [#970](https://github.com/improbable-eng/thanos/pull/970) Deprecated `partial_response_disabled` proto field. Added `partial_response_strategy` instead. Both in gRPC and Query API. No `PartialResponseStrategy` field for `RuleGroups` by default means `abort` strategy (old PartialResponse disabled) as this is recommended option for Rules and alerts. Metrics: - + * Added `thanos_rule_evaluation_with_warnings_total` to Ruler. * DNS `thanos_ruler_query_apis*` are now `thanos_ruler_query_apis_*` for consistency. * DNS `thanos_querier_store_apis*` are now `thanos_querier_store_apis__*` for consistency. * Query Gate `thanos_bucket_store_series*` are now `thanos_bucket_store_series_*` for consistency. * Most of thanos ruler metris related to rule manager has `strategy` label. - + Ruler tracing spans: - + * `/rule_instant_query HTTP[client]` is now `/rule_instant_query_part_resp_abort HTTP[client]"` if request is for abort strategy. - + - [#1009](https://github.com/improbable-eng/thanos/pull/1009): Upgraded Prometheus (~v2.7.0-rc.0 to v2.8.1) and TSDB (`v0.4.0` to `v0.6.1`) deps. - + Changes that affects Thanos: - * query: - * [ENHANCEMENT] In histogram_quantile merge buckets with equivalent le values. #5158. - * [ENHANCEMENT] Show list of offending labels in the error message in many-to-many scenarios. #5189 + * query: + * [ENHANCEMENT] In histogram_quantile merge buckets with equivalent le values. #5158. + * [ENHANCEMENT] Show list of offending labels in the error message in many-to-many scenarios. #5189 * [BUGFIX] Fix panic when aggregator param is not a literal. #5290 - * ruler: + * ruler: * [ENHANCEMENT] Reduce time that Alertmanagers are in flux when reloaded. #5126 * [BUGFIX] prometheus_rule_group_last_evaluation_timestamp_seconds is now a unix timestamp. #5186 * [BUGFIX] prometheus_rule_group_last_duration_seconds now reports seconds instead of nanoseconds. Fixes our [issue #1027](https://github.com/improbable-eng/thanos/issues/1027) @@ -179,26 +181,26 @@ Note that this is required to have SRV resolution working on [Golang 1.11+ with * [CHANGE] *breaking* Renamed flag `--sync-delay` to `--consistency-delay` [#1053](https://github.com/improbable-eng/thanos/pull/1053) For ruler essentially whole TSDB CHANGELOG applies beween v0.4.0-v0.6.1: https://github.com/prometheus/tsdb/blob/master/CHANGELOG.md - + Note that this was added on TSDB and Prometheus: [FEATURE] Time-ovelapping blocks are now allowed. #370 Whoever due to nature of Thanos compaction (distributed systems), for safety reason this is disabled for Thanos compactor for now. - [#868](https://github.com/improbable-eng/thanos/pull/868) Go has been updated to 1.12. -- [#1055](https://github.com/improbable-eng/thanos/pull/1055) Gossip flags are now disabled by default and deprecated. +- [#1055](https://github.com/improbable-eng/thanos/pull/1055) Gossip flags are now disabled by default and deprecated. - [#964](https://github.com/improbable-eng/thanos/pull/964) repair: Repair process now sorts the series and labels within block. - [#1073](https://github.com/improbable-eng/thanos/pull/1073) Store: index cache for requests. It now calculates the size properly (includes slice header), has anti-deadlock safeguard and reports more metrics. ### Fixed - [#921](https://github.com/improbable-eng/thanos/pull/921) `thanos_objstore_bucket_last_successful_upload_time` now does not appear when no blocks have been uploaded so far. -- [#966](https://github.com/improbable-eng/thanos/pull/966) Bucket: verify no longer warns about overlapping blocks, that overlap `0s` +- [#966](https://github.com/improbable-eng/thanos/pull/966) Bucket: verify no longer warns about overlapping blocks, that overlap `0s` - [#848](https://github.com/improbable-eng/thanos/pull/848) Compact: now correctly works with time series with duplicate labels. - [#894](https://github.com/improbable-eng/thanos/pull/894) Thanos Rule: UI now correctly shows evaluation time. - [#865](https://github.com/improbable-eng/thanos/pull/865) Query: now properly parses DNS SRV Service Discovery. - [#889](https://github.com/improbable-eng/thanos/pull/889) Store: added safeguard against merging posting groups segfault - [#941](https://github.com/improbable-eng/thanos/pull/941) Sidecar: added better handling of intermediate restarts. - [#933](https://github.com/improbable-eng/thanos/pull/933) Query: Fixed 30 seconds lag of adding new store to query. -- [#962](https://github.com/improbable-eng/thanos/pull/962) Sidecar: Make config reloader file writes atomic. +- [#962](https://github.com/improbable-eng/thanos/pull/962) Sidecar: Make config reloader file writes atomic. - [#982](https://github.com/improbable-eng/thanos/pull/982) Query: now advertises Min & Max Time accordingly to the nodes. - [#1041](https://github.com/improbable-eng/thanos/issues/1038) Ruler is now able to return long time range queries. - [#904](https://github.com/improbable-eng/thanos/pull/904) Compact: Skip compaction for blocks with no samples. diff --git a/cmd/thanos/check.go b/cmd/thanos/check.go new file mode 100644 index 0000000000..25d7be8313 --- /dev/null +++ b/cmd/thanos/check.go @@ -0,0 +1,119 @@ +package main + +import ( + "io/ioutil" + + "github.com/go-kit/kit/log" + "github.com/go-kit/kit/log/level" + thanosrule "github.com/improbable-eng/thanos/pkg/rule" + "github.com/oklog/run" + "github.com/opentracing/opentracing-go" + "github.com/prometheus/client_golang/prometheus" + "github.com/prometheus/prometheus/pkg/rulefmt" + "github.com/prometheus/tsdb" + "gopkg.in/alecthomas/kingpin.v2" + "gopkg.in/yaml.v2" +) + +func registerChecks(m map[string]setupFunc, app *kingpin.Application, name string) { + cmd := app.Command(name, "Linting tools for Thanos") + registerCheckRules(m, cmd, name) +} + +func registerCheckRules(m map[string]setupFunc, root *kingpin.CmdClause, name string) { + checkRulesCmd := root.Command("rules", "Check if the rule files are valid or not.") + ruleFiles := checkRulesCmd.Arg( + "rule-files", + "The rule files to check.", + ).Required().ExistingFiles() + + m[name+" rules"] = func(g *run.Group, logger log.Logger, reg *prometheus.Registry, _ opentracing.Tracer, _ bool) error { + // Dummy actor to immediately kill the group after the run function returns. + g.Add(func() error { return nil }, func(error) {}) + return checkRulesFiles(logger, ruleFiles) + } +} + +func checkRulesFiles(logger log.Logger, files *[]string) error { + failed := tsdb.MultiError{} + + for _, f := range *files { + n, errs := checkRules(logger, f) + if errs.Err() != nil { + level.Error(logger).Log("result", "FAILED") + for _, e := range errs { + level.Error(logger).Log("error", e.Error()) + failed.Add(e) + } + level.Info(logger).Log() + continue + } + level.Info(logger).Log("result", "SUCCESS", "rules found", n) + } + if failed.Err() != nil { + return failed + } + return nil +} + +func checkRules(logger log.Logger, filename string) (int, tsdb.MultiError) { + level.Info(logger).Log("msg", "checking", "filename", filename) + checkErrors := tsdb.MultiError{} + + b, err := ioutil.ReadFile(filename) + if err != nil { + checkErrors.Add(err) + return 0, checkErrors + } + + var rgs thanosrule.RuleGroups + if err := yaml.Unmarshal(b, &rgs); err != nil { + checkErrors.Add(err) + return 0, checkErrors + } + + // We need to convert Thanos rules to Prometheus rules so we can use their validation. + promRgs := thanosRuleGroupsToPromRuleGroups(rgs) + if errs := promRgs.Validate(); errs != nil { + for _, e := range errs { + checkErrors.Add(e) + } + return 0, checkErrors + } + + numRules := 0 + for _, rg := range rgs.Groups { + numRules += len(rg.Rules) + } + + return numRules, checkErrors +} + +func thanosRuleGroupsToPromRuleGroups(ruleGroups thanosrule.RuleGroups) rulefmt.RuleGroups { + promRuleGroups := rulefmt.RuleGroups{Groups: []rulefmt.RuleGroup{}} + for _, g := range ruleGroups.Groups { + group := rulefmt.RuleGroup{ + Name: g.Name, + Interval: g.Interval, + Rules: []rulefmt.Rule{}, + } + for _, r := range g.Rules { + group.Rules = append( + group.Rules, + rulefmt.Rule{ + Record: r.Record, + Alert: r.Alert, + Expr: r.Expr, + For: r.For, + Labels: r.Labels, + Annotations: r.Annotations, + }, + ) + } + promRuleGroups.Groups = append( + promRuleGroups.Groups, + group, + ) + } + return promRuleGroups +} diff --git a/cmd/thanos/check_test.go b/cmd/thanos/check_test.go new file mode 100644 index 0000000000..e6c1d6e797 --- /dev/null +++ b/cmd/thanos/check_test.go @@ -0,0 +1,29 @@ +package main + +import ( + "testing" + + "github.com/go-kit/kit/log" + "github.com/improbable-eng/thanos/pkg/testutil" +) + +func Test_checkRules(t *testing.T) { + + validFiles := []string{ + "./testdata/rules-files/valid.yaml", + } + + invalidFiles := [][]string{ + []string{"./testdata/rules-files/non-existing-file.yaml"}, + []string{"./testdata/rules-files/invalid-yaml-format.yaml"}, + []string{"./testdata/rules-files/invalid-rules-data.yaml"}, + } + + logger := log.NewNopLogger() + + testutil.Ok(t, checkRulesFiles(logger, &validFiles)) + + for _, fn := range invalidFiles { + testutil.NotOk(t, checkRulesFiles(logger, &fn)) + } +} diff --git a/cmd/thanos/main.go b/cmd/thanos/main.go index 538b5d58c1..51e66c9206 100644 --- a/cmd/thanos/main.go +++ b/cmd/thanos/main.go @@ -77,6 +77,7 @@ func main() { registerBucket(cmds, app, "bucket") registerDownsample(cmds, app, "downsample") registerReceive(cmds, app, "receive") + registerChecks(cmds, app, "check") cmd, err := app.Parse(os.Args[1:]) if err != nil { diff --git a/cmd/thanos/testdata/rules-files/invalid-rules-data.yaml b/cmd/thanos/testdata/rules-files/invalid-rules-data.yaml new file mode 100644 index 0000000000..575783d716 --- /dev/null +++ b/cmd/thanos/testdata/rules-files/invalid-rules-data.yaml @@ -0,0 +1,12 @@ +groups: + - name: null + partial_response_strategy: "warn" + interval: 2m + rules: + - alert: TestAlert + partial_response_strategy: "warn" + expr: 1 + labels: + key: value + annotations: + key: value diff --git a/cmd/thanos/testdata/rules-files/invalid-yaml-format.yaml b/cmd/thanos/testdata/rules-files/invalid-yaml-format.yaml new file mode 100644 index 0000000000..58efff2010 --- /dev/null +++ b/cmd/thanos/testdata/rules-files/invalid-yaml-format.yaml @@ -0,0 +1,3 @@ +groups: + - name: test + invalid_yaml_reason diff --git a/cmd/thanos/testdata/rules-files/valid.yaml b/cmd/thanos/testdata/rules-files/valid.yaml new file mode 100644 index 0000000000..c160a6f337 --- /dev/null +++ b/cmd/thanos/testdata/rules-files/valid.yaml @@ -0,0 +1,20 @@ +groups: + - name: test-alert-group + partial_response_strategy: "warn" + interval: 2m + rules: + - alert: TestAlert + partial_response_strategy: "warn" + expr: 1 + labels: + key: value + annotations: + key: value + + - name: test-rule-group + partial_response_strategy: "warn" + interval: 2m + rules: + - record: test_metric + expr: 1 + partial_response_strategy: "warn" diff --git a/docs/components/check.md b/docs/components/check.md new file mode 100644 index 0000000000..58cbfb9f3d --- /dev/null +++ b/docs/components/check.md @@ -0,0 +1,82 @@ +--- +title: Check +type: docs +menu: components +--- + +# Check + +The check component contains tools for validation of Prometheus rules. + +## Deployment +## Flags + +[embedmd]:# (flags/check.txt $) +```$ +usage: thanos check [ ...] + +Linting tools for Thanos + +Flags: + -h, --help Show context-sensitive help (also try --help-long and + --help-man). + --version Show application version. + --log.level=info Log filtering level. + --log.format=logfmt Log format to use. + --gcloudtrace.project=GCLOUDTRACE.PROJECT + GCP project to send Google Cloud Trace tracings to. + If empty, tracing will be disabled. + --gcloudtrace.sample-factor=1 + How often we send traces (1/). If 0 no + trace will be sent periodically, unless forced by + baggage item. See `pkg/tracing/tracing.go` for + details. + +Subcommands: + check rules ... + Check if the rule files are valid or not. + + +``` + + +### Verify + +`check rules` checks the Prometheus rules, used by the Thanos rule node, if they are valid. +The check should be equivalent for the `promtool check rules` but that cannot be used because +Thanos rule has extended rules file syntax, which includes `partial_response_strategy` field +which `promtool` does not allow. + +If the check fails the command fails with exit code `1`, otherwise `0`. + +Example: + +``` +$ ./thanos check rules cmd/thanos/testdata/rules-files/*.yaml +``` + +[embedmd]:# (flags/check_rules.txt) +```txt +usage: thanos check rules ... + +Check if the rule files are valid or not. + +Flags: + -h, --help Show context-sensitive help (also try --help-long and + --help-man). + --version Show application version. + --log.level=info Log filtering level. + --log.format=logfmt Log format to use. + --gcloudtrace.project=GCLOUDTRACE.PROJECT + GCP project to send Google Cloud Trace tracings to. + If empty, tracing will be disabled. + --gcloudtrace.sample-factor=1 + How often we send traces (1/). If 0 no + trace will be sent periodically, unless forced by + baggage item. See `pkg/tracing/tracing.go` for + details. + +Args: + The rule files to check. + +``` diff --git a/scripts/genflagdocs.sh b/scripts/genflagdocs.sh index 84fa0bdbc8..ffa77fd9c9 100755 --- a/scripts/genflagdocs.sh +++ b/scripts/genflagdocs.sh @@ -33,7 +33,7 @@ fi CHECK=${1:-} -commands=("compact" "query" "rule" "sidecar" "store" "bucket") +commands=("compact" "query" "rule" "sidecar" "store" "bucket" "check") for x in "${commands[@]}"; do ./thanos "${x}" --help &> "docs/components/flags/${x}.txt" @@ -44,6 +44,11 @@ for x in "${bucketCommands[@]}"; do ./thanos bucket "${x}" --help &> "docs/components/flags/bucket_${x}.txt" done +checkCommands=("rules") +for x in "${checkCommands[@]}"; do + ./thanos check "${x}" --help &> "docs/components/flags/check_${x}.txt" +done + # remove white noise sed -i 's/[ \t]*$//' docs/components/flags/*.txt From b619716fabf7b242455d606662b3917a046ef745 Mon Sep 17 00:00:00 2001 From: Martin Chodur Date: Sat, 11 May 2019 08:47:22 +0200 Subject: [PATCH 2/6] feat: switched to strict unmarshal in rule check --- cmd/thanos/check.go | 17 +++++++++++++---- cmd/thanos/check_test.go | 1 + .../rules-files/invalid-unknown-field.yaml | 18 ++++++++++++++++++ cmd/thanos/testdata/rules-files/valid.yaml | 2 -- 4 files changed, 32 insertions(+), 6 deletions(-) create mode 100644 cmd/thanos/testdata/rules-files/invalid-unknown-field.yaml diff --git a/cmd/thanos/check.go b/cmd/thanos/check.go index 25d7be8313..5984b902d4 100644 --- a/cmd/thanos/check.go +++ b/cmd/thanos/check.go @@ -5,7 +5,6 @@ import ( "github.com/go-kit/kit/log" "github.com/go-kit/kit/log/level" - thanosrule "github.com/improbable-eng/thanos/pkg/rule" "github.com/oklog/run" "github.com/opentracing/opentracing-go" "github.com/prometheus/client_golang/prometheus" @@ -56,6 +55,16 @@ func checkRulesFiles(logger log.Logger, files *[]string) error { return nil } +type ThanosRuleGroup struct{ + PartialResponseStrategy string `yaml:"partial_response_strategy"` + rulefmt.RuleGroup `yaml:",inline"` +} + +type ThanosRuleGroups struct { + Groups []ThanosRuleGroup `yaml:"groups"` +} + + func checkRules(logger log.Logger, filename string) (int, tsdb.MultiError) { level.Info(logger).Log("msg", "checking", "filename", filename) checkErrors := tsdb.MultiError{} @@ -66,8 +75,8 @@ func checkRules(logger log.Logger, filename string) (int, tsdb.MultiError) { return 0, checkErrors } - var rgs thanosrule.RuleGroups - if err := yaml.Unmarshal(b, &rgs); err != nil { + var rgs ThanosRuleGroups + if err := yaml.UnmarshalStrict(b, &rgs); err != nil { checkErrors.Add(err) return 0, checkErrors } @@ -89,7 +98,7 @@ func checkRules(logger log.Logger, filename string) (int, tsdb.MultiError) { return numRules, checkErrors } -func thanosRuleGroupsToPromRuleGroups(ruleGroups thanosrule.RuleGroups) rulefmt.RuleGroups { +func thanosRuleGroupsToPromRuleGroups(ruleGroups ThanosRuleGroups) rulefmt.RuleGroups { promRuleGroups := rulefmt.RuleGroups{Groups: []rulefmt.RuleGroup{}} for _, g := range ruleGroups.Groups { group := rulefmt.RuleGroup{ diff --git a/cmd/thanos/check_test.go b/cmd/thanos/check_test.go index e6c1d6e797..71fb954a90 100644 --- a/cmd/thanos/check_test.go +++ b/cmd/thanos/check_test.go @@ -17,6 +17,7 @@ func Test_checkRules(t *testing.T) { []string{"./testdata/rules-files/non-existing-file.yaml"}, []string{"./testdata/rules-files/invalid-yaml-format.yaml"}, []string{"./testdata/rules-files/invalid-rules-data.yaml"}, + []string{"./testdata/rules-files/invalid-unknown-field.yaml"}, } logger := log.NewNopLogger() diff --git a/cmd/thanos/testdata/rules-files/invalid-unknown-field.yaml b/cmd/thanos/testdata/rules-files/invalid-unknown-field.yaml new file mode 100644 index 0000000000..d1759d7f99 --- /dev/null +++ b/cmd/thanos/testdata/rules-files/invalid-unknown-field.yaml @@ -0,0 +1,18 @@ +groups: + - name: test-alert-group + partial_response_strategy: "warn" + interrrrrrrrval: 2m + rules: + - alert: TestAlert + expr: 1 + labels: + key: value + annotations: + key: value + + - name: test-rule-group + partial_response_strategy: "warn" + interval: 2m + rules: + - record: test_metric + expr: 1 diff --git a/cmd/thanos/testdata/rules-files/valid.yaml b/cmd/thanos/testdata/rules-files/valid.yaml index c160a6f337..fb75e799e2 100644 --- a/cmd/thanos/testdata/rules-files/valid.yaml +++ b/cmd/thanos/testdata/rules-files/valid.yaml @@ -4,7 +4,6 @@ groups: interval: 2m rules: - alert: TestAlert - partial_response_strategy: "warn" expr: 1 labels: key: value @@ -17,4 +16,3 @@ groups: rules: - record: test_metric expr: 1 - partial_response_strategy: "warn" From 05bf5dbadadf3c57a59ed9c1ef0c040c1a58b62e Mon Sep 17 00:00:00 2001 From: Martin Chodur Date: Mon, 3 Jun 2019 23:02:14 +0200 Subject: [PATCH 3/6] refactor: changed logging of highest error using logger --- cmd/thanos/check.go | 7 +++---- cmd/thanos/main.go | 2 +- 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/cmd/thanos/check.go b/cmd/thanos/check.go index 5984b902d4..696634be0f 100644 --- a/cmd/thanos/check.go +++ b/cmd/thanos/check.go @@ -55,16 +55,15 @@ func checkRulesFiles(logger log.Logger, files *[]string) error { return nil } -type ThanosRuleGroup struct{ - PartialResponseStrategy string `yaml:"partial_response_strategy"` - rulefmt.RuleGroup `yaml:",inline"` +type ThanosRuleGroup struct { + PartialResponseStrategy string `yaml:"partial_response_strategy"` + rulefmt.RuleGroup `yaml:",inline"` } type ThanosRuleGroups struct { Groups []ThanosRuleGroup `yaml:"groups"` } - func checkRules(logger log.Logger, filename string) (int, tsdb.MultiError) { level.Info(logger).Log("msg", "checking", "filename", filename) checkErrors := tsdb.MultiError{} diff --git a/cmd/thanos/main.go b/cmd/thanos/main.go index 51e66c9206..caa8e29a8b 100644 --- a/cmd/thanos/main.go +++ b/cmd/thanos/main.go @@ -177,7 +177,7 @@ func main() { } if err := cmds[cmd](&g, logger, metrics, tracer, *logLevel == "debug"); err != nil { - fmt.Fprintln(os.Stderr, errors.Wrapf(err, "%s command failed", cmd)) + level.Error(logger).Log("err", errors.Wrapf(err, "%s command failed", cmd)) os.Exit(1) } From 5518fffecd92538a05e76cb796c988183b3fe952 Mon Sep 17 00:00:00 2001 From: Martin Chodur Date: Mon, 3 Jun 2019 23:05:29 +0200 Subject: [PATCH 4/6] chore: updated changelog --- CHANGELOG.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 64f5c62e95..76134fdbf7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -11,6 +11,10 @@ We use *breaking* word for marking changes that are not backward compatible (rel ## Unreleased. +### Added + +- [#1097](https://github.com/improbable-eng/thanos/pull/1097) Added `thanos check rules` linter for Thanos rule rules files. + ## [v0.5.0](https://github.com/improbable-eng/thanos/releases/tag/v0.5.0) - 2019.06.05 TL;DR: Store LRU cache is no longer leaking, Upgraded Thanos UI to Prometheus 2.9, Fixed auto-downsampling, Moved to Go 1.12.5 and more. @@ -29,8 +33,6 @@ This version moved tarballs to Golang 1.12.5 from 1.11 as well, so same warning ### Added - [#1094](https://github.com/improbable-eng/thanos/pull/1094) Allow configuring the response header timeout for the S3 client. -- [#1097](https://github.com/improbable-eng/thanos/pull/1097) Added `thanos check rules` linter for Thanos rule rules files. - ### Changed From a4ee7f338ae8f575dab473fe3a7853d0ef6c419a Mon Sep 17 00:00:00 2001 From: Martin Chodur Date: Sat, 8 Jun 2019 22:12:14 +0200 Subject: [PATCH 5/6] fix check: fixed tsdb MultiError package import --- cmd/thanos/check.go | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/cmd/thanos/check.go b/cmd/thanos/check.go index 696634be0f..4e1194e109 100644 --- a/cmd/thanos/check.go +++ b/cmd/thanos/check.go @@ -9,7 +9,7 @@ import ( "github.com/opentracing/opentracing-go" "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/prometheus/pkg/rulefmt" - "github.com/prometheus/tsdb" + "github.com/prometheus/tsdb/errors" "gopkg.in/alecthomas/kingpin.v2" "gopkg.in/yaml.v2" ) @@ -34,7 +34,7 @@ func registerCheckRules(m map[string]setupFunc, root *kingpin.CmdClause, name st } func checkRulesFiles(logger log.Logger, files *[]string) error { - failed := tsdb.MultiError{} + failed := errors.MultiError{} for _, f := range *files { n, errs := checkRules(logger, f) @@ -64,9 +64,9 @@ type ThanosRuleGroups struct { Groups []ThanosRuleGroup `yaml:"groups"` } -func checkRules(logger log.Logger, filename string) (int, tsdb.MultiError) { +func checkRules(logger log.Logger, filename string) (int, errors.MultiError) { level.Info(logger).Log("msg", "checking", "filename", filename) - checkErrors := tsdb.MultiError{} + checkErrors := errors.MultiError{} b, err := ioutil.ReadFile(filename) if err != nil { From f4e9d45d171d5288ff62d48c69e245553f818526 Mon Sep 17 00:00:00 2001 From: Martin Chodur Date: Sat, 8 Jun 2019 22:16:07 +0200 Subject: [PATCH 6/6] fix docs: update check cmd flags docs --- CHANGELOG.md | 2 +- docs/components/check.md | 28 ++++++++++++---------------- 2 files changed, 13 insertions(+), 17 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 76134fdbf7..83890a11ff 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,7 +12,7 @@ We use *breaking* word for marking changes that are not backward compatible (rel ## Unreleased. ### Added - + - [#1097](https://github.com/improbable-eng/thanos/pull/1097) Added `thanos check rules` linter for Thanos rule rules files. ## [v0.5.0](https://github.com/improbable-eng/thanos/releases/tag/v0.5.0) - 2019.06.05 diff --git a/docs/components/check.md b/docs/components/check.md index 58cbfb9f3d..323f343df3 100644 --- a/docs/components/check.md +++ b/docs/components/check.md @@ -23,14 +23,12 @@ Flags: --version Show application version. --log.level=info Log filtering level. --log.format=logfmt Log format to use. - --gcloudtrace.project=GCLOUDTRACE.PROJECT - GCP project to send Google Cloud Trace tracings to. - If empty, tracing will be disabled. - --gcloudtrace.sample-factor=1 - How often we send traces (1/). If 0 no - trace will be sent periodically, unless forced by - baggage item. See `pkg/tracing/tracing.go` for - details. + --tracing.config-file= + Path to YAML file that contains tracing + configuration. + --tracing.config= + Alternative to 'tracing.config-file' flag. Tracing + configuration in YAML. Subcommands: check rules ... @@ -67,14 +65,12 @@ Flags: --version Show application version. --log.level=info Log filtering level. --log.format=logfmt Log format to use. - --gcloudtrace.project=GCLOUDTRACE.PROJECT - GCP project to send Google Cloud Trace tracings to. - If empty, tracing will be disabled. - --gcloudtrace.sample-factor=1 - How often we send traces (1/). If 0 no - trace will be sent periodically, unless forced by - baggage item. See `pkg/tracing/tracing.go` for - details. + --tracing.config-file= + Path to YAML file that contains tracing + configuration. + --tracing.config= + Alternative to 'tracing.config-file' flag. Tracing + configuration in YAML. Args: The rule files to check.