Expose metrics in protobuf format #1604

invidian · 2021-10-11T08:23:28Z

What would you like to be added:

kube-apiserver and kubelet (and probably other core Kubernetes components) supports scraping Prometheus metrics using protobuf:

> GET /metrics HTTP/2
> Host: 10.0.0.12:10250
> user-agent: curl/7.78.0
> accept: application/vnd.google.protobuf;proto=io.prometheus.client.MetricFamily;encoding=delimited;q=0.7,text/plain;version=0.0.4;q=0.3
>
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [1841 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
} [5 bytes data]
< HTTP/2 200
< content-type: application/vnd.google.protobuf; proto=io.prometheus.client.MetricFamily; encoding=delimited
< date: Mon, 11 Oct 2021 08:08:42 GMT
<

KSM only accepts plaintext protocol though:

09:24:13.100367 lo    In  IP6 (flowlabel 0x0b53e, hlim 64, next-header TCP (6) payload length: 290) ::1.42542 > ::1.8080: Flags [P.], cksum 0x012a (incorrect -> 0xc0de), seq 1:259, ack 1, win 512, options [nop,nop,TS val 963065885 ecr 963065885], length 258: HTTP, length: 258
        GET /metrics HTTP/1.1
        Host: localhost:8080
        User-Agent: Go-http-client/1.1
        Accept: application/vnd.google.protobuf;proto=io.prometheus.client.MetricFamily;encoding=delimited;q=0.7,text/plain;version=0.0.4;q=0.3
        Accept-Encoding: gzip
        Connection: close

09:24:13.100379 lo    In  IP6 (flowlabel 0x541a0, hlim 64, next-header TCP (6) payload length: 32) ::1.8080 > ::1.42542: Flags [.], cksum 0x0028 (incorrect -> 0x7144), ack 259, win 510, options [nop,nop,TS val 963065885 ecr 963065885], length 0
09:24:13.225339 lo    In  IP6 (flowlabel 0x541a0, hlim 64, next-header TCP (6) payload length: 32800) ::1.8080 > ::1.42542: Flags [P.], cksum 0x8028 (incorrect -> 0x23cb), seq 1:32769, ack 259, win 512, options [nop,nop,TS val 963066010 ecr 963065885], length 32768: HTTP, length: 32768
        HTTP/1.1 200 OK
        Content-Type: text/plain; version=0.0.4
        Date: Mon, 11 Oct 2021 07:24:13 GMT
        Connection: close
        Transfer-Encoding: chunked

Protobuf definitely offers smaller network traffic, as number of data transferred varies between 3-7 times less from quick testing in favor of protobuf. I've seen #498 issue, but I couldn't find anything related to this issue. Possibly protobuf encoding is also more CPU efficient?

Why is this needed:

To make KSM consume less resources and to make it more aligned with core Kubernetes components.

Additional context

Discovered as part of work on newrelic/nri-kubernetes#234.

The text was updated successfully, but these errors were encountered:

Serializator · 2021-12-23T23:12:46Z

DISCLAIMER; I do not mean to bash on the code which is written or on the person who wrote the code. It is only meant as "what should be done to allow for an "easier" path of implementing Protocol Buffers" and not in any way as criticism.

(metricshandler.MetricsHandler).ServeHTTP is responsible for serving HTTP requests for metrics. Though metricshandler.MetricsHandler is not only responsible for writing metrics to the response (to be a bit more specific than "serving HTTP requests") but as well for sharding and applying compression (GZIP).

https://github.com/kubernetes/kube-state-metrics/blob/master/pkg/metricshandler/metrics_handler.go

I think this does too much and should be refactored before even trying to implement Protocol Buffers.

Sharding should be refactored outside of the HTTP handler, such that the HTTP handler is only aware of the metrics that should be written.
GZIP compression should be "decorated" onto the the HTTP handler or http.ResponseWriter before or after it is passed into (metricshandler.MetricsHandler).ServeHTTP, thus taking away the responsibility of applying compression away from the HTTP handler.

Though this goes further into (metricsstore.MetricsWriter).WriteAll which makes assumptions about the data format as well by manually writing \n to the response after the "HELP" of a metric.

https://github.com/kubernetes/kube-state-metrics/blob/master/pkg/metrics_store/metrics_writer.go#L61-L69

In metricsstore.MetricsStore the metrics are already kept as a multi-dimensional byte array ([][]byte). Though I don't know whether this is a problem for implementing Protocol Buffers or not.

https://github.com/kubernetes/kube-state-metrics/blob/master/pkg/metrics_store/metrics_store.go#L39

//cc @fpetkovski what do you think? In terms of the specific arguments I made about the code that should be refactored but as well about the approach of doing a refactor before implementing Protocol Buffers to keep it small and manageable.

A refactor would not only be about separating responsibilities but also be a bit of thinking ahead in terms of what abstractions to put in place to allow for easier implementation of Protocol Buffers in the future (or other formats for that matter).

Serializator · 2022-01-03T10:52:15Z

I asked this in Slack as well but will ask it on here too simply so that everything and every question / answer is kept within the issue and doesn't get lost 👍🏼

I was thinking / prototyping a bit to implement Protocol Buffers in KSM and a simple question arose with maybe a simple answer but I don't know.

Why hasn't KSM used the Go client library by Prometheus to implement its metrics? The question arose because the client seems to already support Protocol Buffers.

https://kubernetes.slack.com/archives/CJJ529RUY/p1640638803044600

invidian · 2022-01-03T11:04:35Z

Thanks for looking into it @Serializator!

Why hasn't KSM used the Go client library by Prometheus to implement its metrics?

I forgot to mention that in the opening post. I think that would definitely be a preferable solution for this issue!

Serializator · 2022-01-04T08:01:53Z

From @fpetkovski

The main reason for this is because KSM dumps a lot of metrics, especially in large clusters, and using the go client library has proven to be slow and memory intensive in the past.

This might be a useful read to get a bit more context prometheus/client_golang#917

k8s-triage-robot · 2022-04-04T08:40:30Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

fpetkovski · 2022-04-04T08:49:54Z

/remove-lifecycle stale

k8s-triage-robot · 2022-07-03T08:52:11Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

invidian · 2022-07-03T09:23:44Z

/remove-lifecycle stale

k8s-triage-robot · 2022-10-01T09:25:02Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

invidian · 2022-10-01T09:25:53Z

/remove-lifecycle stale

k8s-triage-robot · 2022-12-30T10:05:35Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

invidian · 2023-01-02T09:14:58Z

/remove-lifecycle stale

k8s-triage-robot · 2023-04-02T09:24:44Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

invidian · 2023-04-02T13:14:21Z

/remove-lifecycle stale

k8s-triage-robot · 2023-07-01T13:52:33Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

invidian · 2023-07-03T08:21:47Z

/remove-lifecycle stale

k8s-triage-robot · 2024-01-23T17:51:02Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-02-22T17:57:02Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-03-23T18:40:53Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-03-23T18:40:58Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

invidian added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 11, 2021

invidian mentioned this issue Oct 11, 2021

Scrape metrics using protobuf when possible newrelic/nri-kubernetes#234

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 4, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 4, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 3, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 3, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 1, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 1, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 30, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 2, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 2, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 2, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 1, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 3, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 23, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 22, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose metrics in protobuf format #1604

Expose metrics in protobuf format #1604

invidian commented Oct 11, 2021

Serializator commented Dec 23, 2021

Serializator commented Jan 3, 2022

invidian commented Jan 3, 2022

Serializator commented Jan 4, 2022

k8s-triage-robot commented Apr 4, 2022

fpetkovski commented Apr 4, 2022

k8s-triage-robot commented Jul 3, 2022

invidian commented Jul 3, 2022

k8s-triage-robot commented Oct 1, 2022

invidian commented Oct 1, 2022

k8s-triage-robot commented Dec 30, 2022

invidian commented Jan 2, 2023

k8s-triage-robot commented Apr 2, 2023

invidian commented Apr 2, 2023

k8s-triage-robot commented Jul 1, 2023

invidian commented Jul 3, 2023

k8s-triage-robot commented Jan 23, 2024

k8s-triage-robot commented Feb 22, 2024

k8s-triage-robot commented Mar 23, 2024

k8s-ci-robot commented Mar 23, 2024

Expose metrics in protobuf format #1604

Expose metrics in protobuf format #1604

Comments

invidian commented Oct 11, 2021

Serializator commented Dec 23, 2021

Serializator commented Jan 3, 2022

invidian commented Jan 3, 2022

Serializator commented Jan 4, 2022

k8s-triage-robot commented Apr 4, 2022

fpetkovski commented Apr 4, 2022

k8s-triage-robot commented Jul 3, 2022

invidian commented Jul 3, 2022

k8s-triage-robot commented Oct 1, 2022

invidian commented Oct 1, 2022

k8s-triage-robot commented Dec 30, 2022

invidian commented Jan 2, 2023

k8s-triage-robot commented Apr 2, 2023

invidian commented Apr 2, 2023

k8s-triage-robot commented Jul 1, 2023

invidian commented Jul 3, 2023

k8s-triage-robot commented Jan 23, 2024

k8s-triage-robot commented Feb 22, 2024

k8s-triage-robot commented Mar 23, 2024

k8s-ci-robot commented Mar 23, 2024