Dashboards readiness probe #117

Sebor · 2021-11-02T12:56:50Z

Description

Add k8s readiness probe to opensearch-dashboards helm chart

Related issues

[Enhancement][opensearch-dashboards] Add k8s readiness probe #112

Check List

[ x ] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Alan Mangashev <[email protected]>

Sebor · 2021-11-02T12:57:13Z

Hi @TheAlgo, @peterzhuamazon, @smlx !
Could you review this PR?

peterzhuamazon · 2021-11-03T00:37:43Z

Hi @Sebor thanks for the PR could you please create an issue and link to this PR?
This would make it easier for people to check.

Thanks.

Sebor · 2021-11-03T17:05:53Z

Hi @peterzhuamazon!
I've linked the issue. But what about tests? I think an opensearch server also should be deployed, otherwise opensearch-dashboards always will fail due to readiness probe

peterzhuamazon · 2021-11-03T18:18:18Z

Hi @peterzhuamazon! I've linked the issue. But what about tests? I think an opensearch server also should be deployed, otherwise opensearch-dashboards always will fail due to readiness probe

Are you talking about your own testing?
Also I see LINT check failed again probably on version.

Sebor · 2021-11-03T18:33:55Z

Hi @peterzhuamazon! I've linked the issue. But what about tests? I think an opensearch server also should be deployed, otherwise opensearch-dashboards always will fail due to readiness probe

Are you talking about your own testing? Also I see LINT check failed again probably on version.

No I mean LINT check here, in PR (Lint and Test Charts / lint-test (pull_request) ):

Error: INSTALLATION FAILED: timed out waiting for the condition
...
Warning    Unhealthy    pod/opensearch-dashboards-6z2migwhtu-8cbd64465-tgb98    spec.containers{dashboards}   kubelet, chart-testing-control-plane   Readiness probe failed: Error: Got HTTP code 503 but expected a 200 
...
{"type":"log","@timestamp":"2021-11-03T00:52:41Z","tags":["error","opensearch","data"],"pid":1,"message":"[ConnectionError]: getaddrinfo EAI_AGAIN opensearch-cluster-master opensearch-cluster-master:9200"}

peterzhuamazon · 2021-11-03T18:50:05Z

Hi @peterzhuamazon! I've linked the issue. But what about tests? I think an opensearch server also should be deployed, otherwise opensearch-dashboards always will fail due to readiness probe

Are you talking about your own testing? Also I see LINT check failed again probably on version.

No I mean LINT check here, in PR (Lint and Test Charts / lint-test (pull_request) ):
Error: INSTALLATION FAILED: timed out waiting for the condition
...
Warning    Unhealthy    pod/opensearch-dashboards-6z2migwhtu-8cbd64465-tgb98    spec.containers{dashboards}   kubelet, chart-testing-control-plane   Readiness probe failed: Error: Got HTTP code 503 but expected a 200 
...
{"type":"log","@timestamp":"2021-11-03T00:52:41Z","tags":["error","opensearch","data"],"pid":1,"message":"[ConnectionError]: getaddrinfo EAI_AGAIN opensearch-cluster-master opensearch-cluster-master:9200"}

Seem like it. Good catch here.
Is it possible for you to add this functionalities in another PR?
Thanks.

peterzhuamazon · 2021-11-03T18:51:35Z

@mprimeaux any inputs? ^^

sastorsl · 2021-11-03T19:22:33Z

@Sebor I was wondering, do we really want to log on to opensearch in the readiness check?
I not sure I see the need to start taking down Opensearch Dashboards if the backend is unavailable for a spell?

A simple call to the root (or base path) will give you a 302 (redirect to login), and it will at least verify that that Opensearch Dashboards is responding.
I think that depending on the backend makes things more complicated than needs be, and instead of fixing the backend you also have to struggle with Dashboards going down and up.

What are your thoughts on this?

Sebor · 2021-11-04T11:30:27Z

Hi @sastorsl!
Readiness probe does not log on to opensearch. It checks dashboards only. And dashboards will always give 503 error without connection to opensearch.
You can test it locally:

run opensearch-dashboards container via docker:

docker run -d -p 5601:5601 opensearchproject/opensearch-dashboards

do curl request:

curl -v localhost:5601

The answer will be something like:

*   Trying 127.0.0.1:5601...
* Connected to localhost (127.0.0.1) port 5601 (#0)
> GET / HTTP/1.1
> Host: localhost:5601
> User-Agent: curl/7.76.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 503 Service Unavailable
< retry-after: 30
< content-type: text/html; charset=utf-8
< cache-control: private, no-cache, no-store, must-revalidate
< content-length: 45
< Date: Thu, 04 Nov 2021 11:02:26 GMT
< Connection: keep-alive
<
* Connection #0 to host localhost left intact
OpenSearch Dashboards server is not ready yet

see docker logs:

docker logs -f <container_name>

You'll see a lot of messages:

{"type":"log","@timestamp":"","tags":["error","opensearch","data"],"pid":1,"message":"[ConnectionError]: connect ECONNREFUSED 127.0.0.1:9200"}

That's why test fails. I.e. it is common behavior of dashboards and not readiness probe

sastorsl · 2021-11-04T12:30:48Z

@Sebor you are quite right.
I'll leave it in the open and let the record show, etc... :-)

peterzhuamazon · 2021-11-10T17:15:34Z

@DandyDeveloper @TheAlgo any idea how to proceed with this PR? Thanks.

DandyDeveloper · 2021-11-11T07:21:57Z

I need to investigate and approve if things look good later today. Also, we'll need to pull in master to add the other tests.

TheAlgo · 2021-11-11T08:15:50Z

Yes some investigation is required here to confirm the behaviour

Sebor · 2021-11-11T08:58:46Z

I can propose "dirty" workaround: to add flag to disable\enable readiness probe in chart (disabled by default). For example in values.yaml:

readinessProbe:
  enabled: false

What do you think?

DandyDeveloper · 2021-11-15T00:12:21Z

@Sebor I don't like the dirty option here, because it's technically working around the issue.

I'm not entirely opposed to adding a enabled: false to the CI, but then you avoid testing this entirely, which defeats the purpose of the readiness probes.

Consider this is testing for readiness, a better option would be to include opensearch as a dependency in the workflow and deploy a single node cluster alongside the opensearch-dashboards, so we have a working stack and a "full test" of your readiness probes.

peterzhuamazon · 2021-11-17T17:56:15Z

@Sebor any progress to follow up with what @DandyDeveloper has mentioned above?
Thanks.

Sebor · 2021-11-17T20:07:08Z

Hi @peterzhuamazon!
I don't quite understand what is expected of me

peterzhuamazon · 2021-11-20T02:35:43Z

Hi @peterzhuamazon! I don't quite understand what is expected of me

@DandyDeveloper could you clear up what is the next step that @Sebor can proceed with this PR?
It has been some time since we have this PR.

Thanks.

DandyDeveloper · 2021-11-22T00:19:18Z

@Sebor @peterzhuamazon I was hoping other opinions would weigh in.

I think we should be completely testing this otherwise we're going to get issues down the road.

To implement this properly, we need to have a proper CI mechanism that matches the default values. This means;

DIsable is by default in the values AND therefore the CI and include a note in the README the impact this has by enabling
OR
We find a way to remove the dependency of Opensearch for the readiness in the Dashboards.

TheAlgo · 2021-11-22T05:46:59Z

I feel it is difficult to remove the dependency of dashboards for readiness unless we don't want to go with the standard way to get the health for dashboards. We can find some other variant which can help determine readiness but that can be ambiguous and will require some POC to be done.

Just thinking out loud. What if we can in our CI first install OpenSearch and then do OpenSearch Dashboards everytime we have a change in OpenSearch Dashboards. Opinions?

DandyDeveloper · 2021-11-22T06:53:44Z

@TheAlgo I don't like adding a dependency of having Opensearch deployed, especially in the CI, for the Opensearch Dashboard tests to run. Its a simple solution, but we're worse off long term.

sastorsl · 2021-11-22T07:15:24Z

Is there any resource which is available in OpenSearch Dashboards even if OpenSearch is not available? An image, favicon, html file, anything?

Sebor · 2021-11-23T14:32:29Z

Is there any resource which is available in OpenSearch Dashboards even if OpenSearch is not available? An image, favicon, html file, anything?

Simple text only:

$ curl -vL 127.1:5601
*   Trying 127.0.0.1:5601...
* Connected to 127.0.0.1 (127.0.0.1) port 5601 (#0)
> GET / HTTP/1.1
> Host: 127.0.0.1:5601
> User-Agent: curl/7.77.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 503 Service Unavailable
< retry-after: 30
< content-type: text/html; charset=utf-8
< cache-control: private, no-cache, no-store, must-revalidate
< content-length: 45
< Date: Tue, 23 Nov 2021 14:29:43 GMT
< Connection: keep-alive
<
* Connection #0 to host 127.0.0.1 left intact
OpenSearch Dashboards server is not ready yet

and 503 HTTP response code

peterzhuamazon · 2021-12-09T00:44:59Z

It seems this PR has been stuck for too long.
What would be a way for this to get approved?
@DandyDeveloper @TheAlgo

Thanks.

DandyDeveloper · 2021-12-09T00:59:01Z

@peterzhuamazon I don't have an alternative option to how this could be better at this point.

I think deeper considerations are needed on pulling the trigger on this. Adding the CI dependency is not a great idea to me.

Having an option to disable this in the CI also just hides the problem.

What about raising this on the engineering side to see if a better mechanism for readiness (unauthenticated) can be made available in Opensearch Dashboards, one that also doesn't depend on the upstream Opensearch to confirm readines?

sastorsl · 2021-12-13T07:44:40Z

Repeating my question of last...

Is there any resource which is available in OpenSearch Dashboards even if OpenSearch is not available? An image, favicon, html file, anything?

sastorsl · 2021-12-13T08:06:28Z

I attempted to call various resources such as /favicons/favicon.ico, etc and also got only HTTP response 503.

curl -f -v http://localhost:5601/bootstrap.js
curl -f -v http://localhost:5601/favicon.ico
curl -f -v http://localhost:5601/favicons/favicon.ico
curl -f -v http://localhost:5601/app/home
curl -f -v http://localhost:5601/auth/
curl -f -v http://localhost:5601/auth/openid/
curl -f -v http://localhost:5601/auth/openid/login/

But there is one thing, and that is the port is listening, and actually determines that a HTTP response code 503 is a valid response.

When you connect to an invalid port curl returns errorcode 7 Failed to connect to host.
When connecting to port 5601 (default port) with curl ... you get HTTP response 503, but returncode 0 or returncode 22 with curl -f ....

It is highly inconclusive - BUT there is something listening if you get at least a return code 0 from curl.
It is a start.

However I agree with @DandyDeveloper , this is really something the opensearch-dashboards team should look into or answer.
Present one alternative URL where the application returns 200, while anything else requires a backend probe.

localhost:5601/health  # Should return 200 as long as the application is up
localhost:5601/<anything-else>  # Should only return 200 (or 30x, 40x) if the backend is available

TheAlgo · 2021-12-13T08:22:57Z

I attempted to call various resources such as /favicons/favicon.ico, etc and also got only HTTP response 503.
curl -f -v http://localhost:5601/bootstrap.js
curl -f -v http://localhost:5601/favicon.ico
curl -f -v http://localhost:5601/favicons/favicon.ico
curl -f -v http://localhost:5601/app/home
curl -f -v http://localhost:5601/auth/
curl -f -v http://localhost:5601/auth/openid/
curl -f -v http://localhost:5601/auth/openid/login/
But there is one thing, and that is the port is listening, and actually determines that a 503 is a valid response.

When you connect to an invalid port curl returns errorcode 7 Failed to connect to host and returncode 0. When connecting to port 5601 (default port) with curl ... you get return code 0 and with curl -f ... you get a returncode 22.

It is highly inconclusive - BUT there is something listening if you get at least a return code 0 from curl. It is a start.

But I agree with @DandyDeveloper , this is really something the opensearch-dashboards team should look into or answer. Present one alternative URL where the application returns 200, while anything else requires a backend probe.
localhost:5601/health  # Should return 200 as long as the application is up
localhost:5601/<anything-else>  # Should only return 200 (or 30x, 40x) if the backend is available

@CEHENKLE Can someone from Dashboards team give some guidance here?

peterzhuamazon · 2021-12-16T17:15:21Z

I attempted to call various resources such as /favicons/favicon.ico, etc and also got only HTTP response 503.
curl -f -v http://localhost:5601/bootstrap.js
curl -f -v http://localhost:5601/favicon.ico
curl -f -v http://localhost:5601/favicons/favicon.ico
curl -f -v http://localhost:5601/app/home
curl -f -v http://localhost:5601/auth/
curl -f -v http://localhost:5601/auth/openid/
curl -f -v http://localhost:5601/auth/openid/login/
But there is one thing, and that is the port is listening, and actually determines that a 503 is a valid response.
When you connect to an invalid port curl returns errorcode 7 Failed to connect to host and returncode 0. When connecting to port 5601 (default port) with curl ... you get return code 0 and with curl -f ... you get a returncode 22.
It is highly inconclusive - BUT there is something listening if you get at least a return code 0 from curl. It is a start.
But I agree with @DandyDeveloper , this is really something the opensearch-dashboards team should look into or answer. Present one alternative URL where the application returns 200, while anything else requires a backend probe.
localhost:5601/health  # Should return 200 as long as the application is up
localhost:5601/<anything-else>  # Should only return 200 (or 30x, 40x) if the backend is available
@CEHENKLE Can someone from Dashboards team give some guidance here?

@tmarkley @kavilla Could use some help here for dashboards.
Thanks.

kavilla · 2022-01-03T22:59:39Z

Hey! OpenSearch Dashboards depends on OpenSearch to do anything so with that said having OpenSearch running is a major part of the startup process for OpenSearch Dashboards. If there is no OpenSearch then it won't try to serve up anything. So arguably, 503 is the correct code and implementing something so that it returns a 200 seems no different than:

readinessProbe:
  enabled: false

I think the correct solution would be adding the dependency.

sastorsl · 2022-01-04T07:41:18Z

Hey! OpenSearch Dashboards depends on OpenSearch to do anything so with that said having OpenSearch running is a major part of the startup process for OpenSearch Dashboards. If there is no OpenSearch then it won't try to serve up anything. So arguably, 503 is the correct code and implementing something so that it returns a 200 seems no different than:

To actually be able to do something OpenSearch Dashboards requires Opensearch. Still, there is a need to determine if OpenSearch Dashboard itself is ready.

"The troops are ready, just waiting for the general, Sir!"

tmarkley · 2022-01-04T17:14:19Z

Still, there is a need to determine if OpenSearch Dashboard itself is ready.

@sastorsl As the OpenSearch Dashboards code base exists now, the server fails to start and is not ready in any way if OpenSearch is not running. Do we need to further discuss that point in this PR? Maybe we should be asking what the purpose of this readiness probe is?

As a maintainer of Dashboards I don't believe this is as trivial as adding a "health" API because it wouldn't properly represent the state of the application. If anything we should be checking the health of OpenSearch before Dashboards is started.

sastorsl · 2022-01-04T18:04:16Z

@tmarkley there are comments as to why a minimum test of OpenSearch Dasboards is both desirable and necessary but I'll try to summarize.

CI/CD test just OpenSearch Dashboards and see if it is broken.
Able to deploy OpenSearch Dashboards in an unordered sequence and leave it open until OpenSearch is available.
OpenSearch Dashboards not going down if OpenSearch goes down.

OpenSearch Dashboards requires the backend / OpenSearch to function properly, but still it is a standalone application and should be able to simply report that it is up but that is is missing it's backend.

OpenSearch Dashboards going down when OpenSearch goes down also adds to what must be restarted / checked.

When automating deployment it makes sense to have separate deployment pipelines for OpenSearch and OpenSearch Dashboards since there are two helm charts and they receive updates at uneven intervals. This also means that at first deploy Dashboards might be ready first - and probably is, but then it goes down and up until OpenSearch comes up which needlessly stresses systems and also adds to what ops have to follow up / search for errors in.

These are real life experiences in my case, and I would guess for others as well.

tmarkley · 2022-01-05T17:27:14Z

@sastorsl Thanks for that information, that helps clarify a few things for me. I think these are reasonable requests, but they're going to require some discussion about the path forward for OpenSearch Dashboards. Dashboards is a standalone application is some ways, but historically it's always been coupled to ES/OpenSearch. I don't immediately know the scope required to change this status quo, but again I don't think it's trivial. Do you mind opening an Issue in the OpenSearch Dashboards repo so that we can discuss the options, gauge community interest, and determine the scope of changes that might be required to reach this desired state?

TheAlgo · 2022-01-05T17:47:09Z

@tmarkley Just thinking out loud. At the end OpenSearch Dashboards is a Node JS Server so ideally there should be a way out to check the health of it.

This thing might be helpful in creating Kubernetes operator as well I feel 🤔

tmarkley · 2022-01-05T18:01:24Z

Just thinking out loud. At the end OpenSearch Dashboards is a Node JS Server so ideally there should be a way out to check the health of it.

Yeah, I agree that it's a reasonable thing to expect. A method for that just doesn't currently exist.

Sebor added 2 commits November 2, 2021 15:49

Add configurable readinessProbe

e608502

Signed-off-by: Alan Mangashev <[email protected]>

Update CHANGELOG and bump helm chart version for opensearch-dashboards

0e1227f

Signed-off-by: Alan Mangashev <[email protected]>

peterzhuamazon requested review from DandyDeveloper and TheAlgo November 3, 2021 00:36

peterzhuamazon mentioned this pull request Nov 3, 2021

Weekly Community Engagement on 2021/10/25 - 2021/11/8 opensearch-project/opensearch-build#800

Closed

peterzhuamazon linked an issue Nov 3, 2021 that may be closed by this pull request

[Enhancement][opensearch-dashboards] Add k8s readiness probe #112

Closed

peterzhuamazon mentioned this pull request Nov 3, 2021

[Enhancement][opensearch][opensearch-dashboards] Lifecycle Probes #44

Open

Sebor added 2 commits November 11, 2021 12:01

Merge branch 'main' into dashboards-readiness-probe

c56e892

Merge branch 'main' into dashboards-readiness-probe

4d6d329

TheAlgo mentioned this pull request Sep 27, 2022

Add healthcheck probes to charts #329

Merged

3 tasks

faabsen mentioned this pull request Feb 9, 2023

[Enhancement][opensearch-dashboards] Add k8s readiness probe #112

Closed

This pull request was closed.

Dashboards readiness probe #117

Dashboards readiness probe #117

Conversation

Sebor commented Nov 2, 2021 • edited Loading

Description

Related issues

Check List

Sebor commented Nov 2, 2021

peterzhuamazon commented Nov 3, 2021

Sebor commented Nov 3, 2021

peterzhuamazon commented Nov 3, 2021

Sebor commented Nov 3, 2021

peterzhuamazon commented Nov 3, 2021

peterzhuamazon commented Nov 3, 2021

sastorsl commented Nov 3, 2021

Sebor commented Nov 4, 2021

sastorsl commented Nov 4, 2021

peterzhuamazon commented Nov 10, 2021

DandyDeveloper commented Nov 11, 2021 • edited Loading

TheAlgo commented Nov 11, 2021

Sebor commented Nov 11, 2021

DandyDeveloper commented Nov 15, 2021

peterzhuamazon commented Nov 17, 2021

Sebor commented Nov 17, 2021

peterzhuamazon commented Nov 20, 2021

DandyDeveloper commented Nov 22, 2021

TheAlgo commented Nov 22, 2021

DandyDeveloper commented Nov 22, 2021

sastorsl commented Nov 22, 2021

Sebor commented Nov 23, 2021

peterzhuamazon commented Dec 9, 2021

DandyDeveloper commented Dec 9, 2021

sastorsl commented Dec 13, 2021

sastorsl commented Dec 13, 2021 • edited Loading

TheAlgo commented Dec 13, 2021

peterzhuamazon commented Dec 16, 2021

kavilla commented Jan 3, 2022

sastorsl commented Jan 4, 2022

tmarkley commented Jan 4, 2022

sastorsl commented Jan 4, 2022

tmarkley commented Jan 5, 2022

TheAlgo commented Jan 5, 2022

tmarkley commented Jan 5, 2022

Sebor commented Nov 2, 2021 •

edited

Loading

DandyDeveloper commented Nov 11, 2021 •

edited

Loading

sastorsl commented Dec 13, 2021 •

edited

Loading