Skip to content
This repository was archived by the owner on Jul 11, 2023. It is now read-only.

feat(injector): Set probe timeouts based on pod deployment spec #4149

Merged
merged 1 commit into from
Sep 27, 2021

Conversation

davinci26
Copy link

@davinci26 davinci26 commented Sep 21, 2021

Fixes #4137

Overall we aim to maintain the following invariance:

For all health probes we want the following to be true:

All Envoy (implicit or explicit) timeouts >= the timeout specified in
the pod deployment spec.

Changes:

  • Remove connect timeouts from all clusters
  • For health probe clusters we set the route timeout to be equal to the
    timeout set on the kubernetes pod. This is only applied on HTTP routes
    since plain tcp routes have infinite timeout.

Signed-off-by: Sotiris Nanopoulos [email protected]

Description:

Testing done:

Affected area:

Functional Area
New Functionality [ ]
CI System [ ]
CLI Tool [ ]
Certificate Management [ ]
Control Plane [ ]
Demo [ ]
Documentation [ ]
Egress [ ]
Ingress [ ]
Install [ ]
Networking [ ]
Observability [ ]
Performance [ ]
SMI Policy [ ]
Security [ ]
Sidecar Injection [ ]
Tests [ ]
Upgrade [ ]
Other [ ]

Please answer the following questions with yes/no.

  1. Does this change contain code from or inspired by another project?

    • Did you notify the maintainers and provide attribution?
  2. Is this a breaking change?

@davinci26
Copy link
Author

davinci26 commented Sep 21, 2021

Creating as draft to make sure that lint passes, because it timeouts on my devbox

@codecov-commenter
Copy link

codecov-commenter commented Sep 21, 2021

Codecov Report

Merging #4149 (1397512) into main (dcb2629) will increase coverage by 0.34%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4149      +/-   ##
==========================================
+ Coverage   69.47%   69.81%   +0.34%     
==========================================
  Files         210      212       +2     
  Lines       11423    11579     +156     
==========================================
+ Hits         7936     8084     +148     
- Misses       3434     3442       +8     
  Partials       53       53              
Flag Coverage Δ
unittests 69.81% <100.00%> (+0.34%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/envoy/cds/cluster.go 93.20% <100.00%> (-0.20%) ⬇️
pkg/envoy/cds/tracing.go 100.00% <100.00%> (ø)
pkg/injector/envoy_config_health_probes.go 94.14% <100.00%> (+0.04%) ⬆️
pkg/injector/health_probes.go 100.00% <100.00%> (ø)
pkg/reconciler/mutating_webhook_handler.go 88.57% <0.00%> (-6.03%) ⬇️
pkg/reconciler/crd_handler.go 85.00% <0.00%> (-5.25%) ⬇️
pkg/crdconversion/crdconversion.go 72.44% <0.00%> (-3.07%) ⬇️
cmd/cli/util.go 71.42% <0.00%> (-1.24%) ⬇️
pkg/validator/patch.go 95.40% <0.00%> (-0.60%) ⬇️
pkg/envoy/ads/stream.go 10.55% <0.00%> (-0.17%) ⬇️
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dcb2629...1397512. Read the comment docs.

@davinci26 davinci26 force-pushed the probeTimeouts branch 3 times, most recently from 1397512 to 5376e77 Compare September 27, 2021 16:48
@davinci26 davinci26 marked this pull request as ready for review September 27, 2021 17:15
@davinci26 davinci26 requested a review from a team as a code owner September 27, 2021 17:15
Copy link
Member

@shashankram shashankram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davinci26 Thanks for addressing the issue based on our discussion. This change looks good, except that the commit and PR description is misleading (references to idle_timeout). Could you please address this, looks good otherwise.

Fixes openservicemesh#4137

Overall we aim to maintain the following invariance:

For all health probes we want the following to be true:

All Envoy (implicit or explicit) timeouts >= the timeout specified in
the pod deployment spec.

Changes:

* Remove connect timeouts from all clusters
* For health probe clusters we set the route timeout to be equal to the
timeout set on the kubernetes pod. This is only applied on HTTP routes
since plain tcp routes have infinite timeout.

Signed-off-by: Sotiris Nanopoulos <[email protected]>
@davinci26
Copy link
Author

@shashankram fixed. Thanks for the feedback on this PR offline, highly appreciate it!

@ksubrmnn ksubrmnn merged commit 3e727ed into openservicemesh:main Sep 27, 2021
snehachhabria pushed a commit to snehachhabria/osm that referenced this pull request Sep 27, 2021
…servicemesh#4149)

Fixes openservicemesh#4137

Overall we aim to maintain the following invariance:

For all health probes we want the following to be true:

All Envoy (implicit or explicit) timeouts >= the timeout specified in
the pod deployment spec.

Changes:

* Remove connect timeouts from all clusters
* For health probe clusters we set the route timeout to be equal to the
timeout set on the kubernetes pod. This is only applied on HTTP routes
since plain tcp routes have infinite timeout.

Signed-off-by: Sotiris Nanopoulos <[email protected]>
snehachhabria pushed a commit to snehachhabria/osm that referenced this pull request Oct 14, 2021
…servicemesh#4149)

Fixes openservicemesh#4137

Overall we aim to maintain the following invariance:

For all health probes we want the following to be true:

All Envoy (implicit or explicit) timeouts >= the timeout specified in
the pod deployment spec.

Changes:

* Remove connect timeouts from all clusters
* For health probe clusters we set the route timeout to be equal to the
timeout set on the kubernetes pod. This is only applied on HTTP routes
since plain tcp routes have infinite timeout.

Signed-off-by: Sotiris Nanopoulos <[email protected]>
Signed-off-by: Sneha Chhabria <[email protected]>
allenlsy pushed a commit to allenlsy/osm that referenced this pull request Dec 28, 2021
…servicemesh#4149)

Fixes openservicemesh#4137

Overall we aim to maintain the following invariance:

For all health probes we want the following to be true:

All Envoy (implicit or explicit) timeouts >= the timeout specified in
the pod deployment spec.

Changes:

* Remove connect timeouts from all clusters
* For health probe clusters we set the route timeout to be equal to the
timeout set on the kubernetes pod. This is only applied on HTTP routes
since plain tcp routes have infinite timeout.

Signed-off-by: Sotiris Nanopoulos <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OSM implicitly overwrites the timeouts of kubernetes probes
4 participants