Skip to content

OTA-1521: Add a default-deny network policy for CVO namespace #1198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

petr-muller
Copy link
Member

@petr-muller petr-muller commented May 27, 2025

Add a baseline NetworkPolicy to deny all network communication (both ingress and egress) to all pods in the namespace. Any necessary network traffic needs to be allowed by an additional NetworkPolicy resource (they are additive).

At the moment, the default deny all policy should be the only one needed:

See OTA Network Policies Working Document for more information.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 27, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented May 27, 2025

@petr-muller: This pull request references OTA-1531 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

For now just a testing PR to see what blows up.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 27, 2025
@petr-muller petr-muller changed the title OTA-1531: Add a default-deny network policy for CVO namespace OTA-1521: Add a default-deny network policy for CVO namespace May 27, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented May 27, 2025

@petr-muller: This pull request references OTA-1521 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

For now just a testing PR to see what blows up.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@hongkailiu
Copy link
Member

hongkailiu commented Jun 3, 2025

what blows up

I was expected none of the e2e would succeed (as CVO would broken up to "Deny ALL") but the situation is much better.
Because of hostNetwork: true?

@hongkailiu
Copy link
Member

/cc

@openshift-ci openshift-ci bot requested a review from hongkailiu June 3, 2025 14:04
@petr-muller
Copy link
Member Author

I was expected none of the e2e would succeed (as CVO would broken up to "Deny ALL") but the situation is much better.
Because of hostNetwork: true?

Yeah, checkout the Pragmatic (CVO uses host network) section of the working doc. Network Policies do not affect most host-networked pods traffic. It definitely does not affect the core functionality (being a k8s controller=talking to the apiserver). We will need to test all functionality that uses network communication (see the working doc) to be sure though - it is also possible that we simply do not have tests that exercise the functionality.

@petr-muller petr-muller changed the title OTA-1521: Add a default-deny network policy for CVO namespace OTA-1499: Add a default-deny network policy for CVO namespace Jun 3, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jun 3, 2025

@petr-muller: This pull request references OTA-1499 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.20.0" version, but no target version was set.

In response to this:

For now just a testing PR to see what blows up.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Add a baseline NetworkPolicy to deny all network communication (both
ingress and egress) to all pods in the namespace. Any necessary network
traffic needs to be allowed by an additional NetworkPolicy resource
(they are additive).

At the moment, the default deny all policy should be the only one needed:
- CVO is host-networked so it is [not affected by network policies](https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/networking/network-security#network-policy)
- Bare `version` pods spawned by CVO do not require any network communication

See [OTA Network Policies Working Document](https://docs.google.com/document/d/1Dzr3eYGVl6OBxqfUohugJLsbsn7sYrC3fN6yCe8zTRQ/edit?tab=t.0#heading=h.9vehq2liufe) for more information.
@petr-muller petr-muller force-pushed the ota-1521-add-deny-all-network-policy branch from 76b22be to 07c6b5c Compare June 10, 2025 15:55
@petr-muller petr-muller changed the title OTA-1499: Add a default-deny network policy for CVO namespace OTA-1521: Add a default-deny network policy for CVO namespace Jun 10, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jun 10, 2025

@petr-muller: This pull request references OTA-1499 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.20.0" version, but no target version was set.

In response to this:

Add a baseline NetworkPolicy to deny all network communication (both
ingress and egress) to all pods in the namespace. Any necessary network
traffic needs to be allowed by an additional NetworkPolicy resource
(they are additive).

At the moment, the default deny all policy should be the only one needed:

See OTA Network Policies Working Document for more information.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jun 10, 2025

@petr-muller: This pull request references OTA-1521 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

Add a baseline NetworkPolicy to deny all network communication (both
ingress and egress) to all pods in the namespace. Any necessary network
traffic needs to be allowed by an additional NetworkPolicy resource
(they are additive).

At the moment, the default deny all policy should be the only one needed:

See OTA Network Policies Working Document for more information.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jun 10, 2025

@petr-muller: This pull request references OTA-1521 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

Add a baseline NetworkPolicy to deny all network communication (both ingress and egress) to all pods in the namespace. Any necessary network traffic needs to be allowed by an additional NetworkPolicy resource (they are additive).

At the moment, the default deny all policy should be the only one needed:

See OTA Network Policies Working Document for more information.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@petr-muller
Copy link
Member Author

petr-muller commented Jun 18, 2025

Testing with scripts from https://github.com/petr-muller/vibes/tree/main/scripts shows the default NetworkPolicy correctly prevents pods from communication:

$ ./scripts/test-networkpolicy-external-access.fish openshift-cluster-version
Testing NetworkPolicy external access blocking in namespace: openshift-cluster-version
...
✅ TEST PASSED: NetworkPolicy is blocking external access
External access to http://networkpolicy-netcat-route-openshift-cluster-version.apps.ci-ln-vfv2bgt-76ef8.aws-2.ci.openshift.org was blocked (as expected)

$ ./scripts/test-networkpolicy-isolation.fish openshift-cluster-version
Testing NetworkPolicy isolation in namespace: openshift-cluster-version
...
✅ TEST PASSED: NetworkPolicy is blocking outbound traffic
The pod was unable to reach https://www.google.com (as expected)

$ ./scripts/test-networkpolicy-pod-to-pod.fish openshift-cluster-version
Testing NetworkPolicy pod-to-pod blocking in namespace: openshift-cluster-version
...
✅ TEST PASSED: NetworkPolicy is blocking pod-to-pod communication
Client pod was unable to reach server pod via service (as expected)

@petr-muller
Copy link
Member Author

I inspected CVO's log for signs of networking issues (connection refused, i/o timeout, no route to host etc) that could be a sign of a NetworkPolicy interference, found no issues (also gave that log to Gemini to analyze and it found no issues either).

The up{namespace="openshift-cluster-version", job="cluster-version-operator"} metric (which shows whether Prometheus is successfully scraping the target specified for CVO) has a value of 1 which indicates the monitoring is able to scrape our metrics.

I used fauxinnati update service with smoke-test channel which shows that CVO is able to retrieve its update graph from the external OSUS, and because the smoke-test channel contains risks to be evaluated this also tests that CVO can evaluate Promql using the cluster monitoring stack:

$ oc adm upgrade recommend --version 4.20.10
Upstream: https://fauxinnati-fauxinnati.apps.ota-stage.q2z4.p1.openshiftapps.com/api/upgrades_info/graph
Channel: smoke-test

Update to 4.20.10 Recommended=False:
Image: quay.io/openshift-release-dev/ocp-release@sha256:00000000000000000000000000000000000000000000000000000000003d572a
Release URL: https://access.redhat.com/errata/RHSA-2024:06010
Reason: MultipleReasons
Message: This is RiskA part of combined risks for smoke testing https://docs.openshift.com/synthetic-risk-smoke-combined-a
  
  This is RiskBMatches part of combined risks for smoke testing https://docs.openshift.com/synthetic-risk-smoke-combined-b

Lastly, I updated (actually downgraded) the cluster to 4.20.0-ec.2 using its digest:

$ oc adm upgrade --to-image quay.io/openshift-release-dev/ocp-release@sha256:7f885da9a26ec7f460a4518a40811c03cf278853186833dc80e96a1ae15c9511 --allow-explicit-upgrade

The update was successfully started which shows that CVO is able to fetch the signatures it needs from public cloud storage (without --force CVO will still refuse to update to an unverified payload) and also that the NetworkPolicy does not interfere with the version Pod (which itself does not need any network communication).

@dis016
Copy link

dis016 commented Jul 2, 2025

Test Scenario: CVO NetworkPolicy is blocking outbound traffic
Install a cluster

 
dinesh@Dineshs-MacBook-Pro vibes % oc get clusterversion                                                 
NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.20.0-0-2025-07-02-151812-test-ci-ln-mjn77b2-latest   True        False         41m     Cluster version is 4.20.0-0-2025-07-02-151812-test-ci-ln-mjn77b2-latest
dinesh@Dineshs-MacBook-Pro vibes %

execute test-networkpolicy-isolation.fish script from https://github.com/petr-muller/vibes/blob/main/scripts/test-networkpolicy-isolation.fish

dinesh@Dineshs-MacBook-Pro vibes % ./scripts/test-networkpolicy-isolation.fish openshift-cluster-version 
Testing NetworkPolicy isolation in namespace: openshift-cluster-version
Creating Pod YAML...
Pod YAML content:
apiVersion: v1
kind: Pod
metadata:
  name: networkpolicy-test-pod
  namespace: openshift-cluster-version
spec:
  restartPolicy: Never
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: networkpolicy-test-pod
    image: quay.io/curl/curl:latest
    command: ["curl", "-s", "--connect-timeout", "10", "--max-time", "15", "https://www.google.com"]
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      seccompProfile:
        type: RuntimeDefault
Applying Pod manifest...
Waiting for pod to complete curl attempt...
✅ TEST PASSED: NetworkPolicy is blocking outbound traffic
The pod was unable to reach https://www.google.com (as expected)
dinesh@Dineshs-MacBook-Pro vibes %

@dis016
Copy link

dis016 commented Jul 2, 2025

Test Scenario: CVO NetworkPolicy is blocking pod-to-pod communication
Install a cluster

dinesh@Dineshs-MacBook-Pro vibes % oc get clusterversion                                                 
NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.20.0-0-2025-07-02-151812-test-ci-ln-mjn77b2-latest   True        False         41m     Cluster version is 4.20.0-0-2025-07-02-151812-test-ci-ln-mjn77b2-latest
dinesh@Dineshs-MacBook-Pro vibes %

execute test-networkpolicy-pod-to-pod.fish script from https://github.com/petr-muller/vibes/blob/main/scripts/test-networkpolicy-pod-to-pod.fish


dinesh@Dineshs-MacBook-Pro vibes % ./scripts/test-networkpolicy-pod-to-pod.fish openshift-cluster-version  
Testing NetworkPolicy pod-to-pod blocking in namespace: openshift-cluster-version
Creating Server Pod YAML...
Creating Client Pod YAML...
Creating Service YAML...
Server Pod YAML content:
apiVersion: v1
kind: Pod
metadata:
  name: networkpolicy-server-pod
  namespace: openshift-cluster-version
  labels:
    app: networkpolicy-server
spec:
  restartPolicy: Never
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: http-server
    image: python:3.12-alpine
    command: ["python3", "-m", "http.server", "8080"]
    ports:
    - containerPort: 8080
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      seccompProfile:
        type: RuntimeDefault

Client Pod YAML content:
apiVersion: v1
kind: Pod
metadata:
  name: networkpolicy-client-pod
  namespace: openshift-cluster-version
  labels:
    app: networkpolicy-client
spec:
  restartPolicy: Never
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: curl-client
    image: quay.io/curl/curl:latest
    command: ["sh", "-c", "sleep 30 && curl -s --connect-timeout 10 --max-time 15 http://networkpolicy-server-service:8080"]
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      seccompProfile:
        type: RuntimeDefault

Service YAML content:
apiVersion: v1
kind: Service
metadata:
  name: networkpolicy-server-service
  namespace: openshift-cluster-version
spec:
  selector:
    app: networkpolicy-server
  ports:
  - port: 8080
    targetPort: 8080
  type: ClusterIP
Applying Server Pod manifest...
Applying Service manifest...
Applying Client Pod manifest...
Waiting for server pod to be ready...
Waiting for client pod to be ready...
Waiting for client pod to complete curl attempt to server service...
✅ TEST PASSED: NetworkPolicy is blocking pod-to-pod communication
Client pod was unable to reach server pod via service (as expected)
Cleaning up resources...
dinesh@Dineshs-MacBook-Pro vibes %

@dis016
Copy link

dis016 commented Jul 2, 2025

Test Scenario: External access egress is blocked from openshift-cluster-version namespace
Install a cluster

dinesh@Dineshs-MacBook-Pro vibes % oc get clusterversion                                                 
NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.20.0-0-2025-07-02-151812-test-ci-ln-mjn77b2-latest   True        False         41m     Cluster version is 4.20.0-0-2025-07-02-151812-test-ci-ln-mjn77b2-latest
dinesh@Dineshs-MacBook-Pro vibes %

execute test-networkpolicy-external-access.fish from https://github.com/petr-muller/vibes/blob/main/scripts/test-networkpolicy-external-access.fish

dinesh@Dineshs-MacBook-Pro vibes % ./scripts/test-networkpolicy-external-access.fish openshift-cluster-version  
Testing NetworkPolicy external access blocking in namespace: openshift-cluster-version
Creating Pod YAML...
Creating Service YAML...
Creating Route YAML...
Pod YAML content:
apiVersion: v1
kind: Pod
metadata:
  name: networkpolicy-netcat-server
  namespace: openshift-cluster-version
  labels:
    app: networkpolicy-test
spec:
  restartPolicy: Never
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: http-server
    image: python:3.12-alpine
    command: ["python3", "-m", "http.server", "8080"]
    ports:
    - containerPort: 8080
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      seccompProfile:
        type: RuntimeDefault

Service YAML content:
apiVersion: v1
kind: Service
metadata:
  name: networkpolicy-netcat-service
  namespace: openshift-cluster-version
spec:
  selector:
    app: networkpolicy-test
  ports:
  - port: 8080
    targetPort: 8080
  type: ClusterIP

Route YAML content:
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: networkpolicy-netcat-route
  namespace: openshift-cluster-version
spec:
  to:
    kind: Service
    name: networkpolicy-netcat-service
  port:
    targetPort: 8080
Applying Pod manifest...
Applying Service manifest...
Applying Route manifest...
Waiting for pod to be ready...
Attempting to curl http://networkpolicy-netcat-route-openshift-cluster-version.apps.ci-ln-mjn77b2-1d09d.ci.azure.devcluster.openshift.com from local machine (this should fail if NetworkPolicy is working)...
✅ TEST PASSED: NetworkPolicy is blocking external access
External access to http://networkpolicy-netcat-route-openshift-cluster-version.apps.ci-ln-mjn77b2-1d09d.ci.azure.devcluster.openshift.com was blocked (as expected)
Cleaning up resources...
dinesh@Dineshs-MacBook-Pro vibes % 

@dis016
Copy link

dis016 commented Jul 3, 2025

Test Scenario: Ingress to CVO is not breaking for monitoring scrape

  1. Install a 4.20 cluster
dinesh@Dineshs-MacBook-Pro ~ % oc get clusterversion
NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.20.0-0-2025-07-02-151812-test-ci-ln-mjn77b2-latest   True        False         13m     Cluster version is 4.20.0-0-2025-07-02-151812-test-ci-ln-mjn77b2-latest
dinesh@Dineshs-MacBook-Pro ~ % 
  1. Promotheus is able to communicate to CVO and monitoring scrape should be healthy by having value as "1"
dinesh@Dineshs-MacBook-Pro ~ % PROM_POD=$(oc get pods -n openshift-monitoring -l app.kubernetes.io/name=prometheus -o jsonpath='{.items[0].metadata.name}')

oc exec -n openshift-monitoring "$PROM_POD" -- \
  curl -s "http://localhost:9090/api/v1/query?query=up%7Bnamespace%3D%22openshift-cluster-version%22%2Cjob%3D%22cluster-version-operator%22%7D" | jq

{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "up",
          "container": "cluster-version-operator",
          "endpoint": "metrics",
          "instance": "10.0.0.4:9099",
          "job": "cluster-version-operator",
          "namespace": "openshift-cluster-version",
          "pod": "cluster-version-operator-7b55dd47b7-4fwf4",
          "service": "cluster-version-operator"
        },
        "value": [
          1751477706.255,
          "1"
        ]
      }
    ]
  }
}
dinesh@Dineshs-MacBook-Pro ~ % 

@dis016
Copy link

dis016 commented Jul 3, 2025

Test Scenario: Installation should be success with new Network Policy and CVO should be healthy.

  1. Install a cluster with cluster-bot with network policy
launch 4.20,openshift/cluster-version-operator#1198 azure

Cluster should Installed successfully

dinesh@Dineshs-MacBook-Pro ~ % oc get clusterversion
NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.20.0-0-2025-07-02-151812-test-ci-ln-mjn77b2-latest   True        False         13m     Cluster version is 4.20.0-0-2025-07-02-151812-test-ci-ln-mjn77b2-latest
dinesh@Dineshs-MacBook-Pro ~ % 
  1. Network policy default deny should exist in CVO and no additional ingress/egress rules to allow any traffic.
dinesh@Dineshs-MacBook-Pro ~ % oc get networkpolicy -n openshift-cluster-version
NAME           POD-SELECTOR   AGE
default-deny   <none>         41m
dinesh@Dineshs-MacBook-Pro ~ % oc describe networkpolicy default-deny -n openshift-cluster-version   
Name:         default-deny
Namespace:    openshift-cluster-version
Created on:   2025-07-02 21:24:27 +0530 IST
Labels:       <none>
Annotations:  <none>
Spec:
  PodSelector:     <none> (Allowing the specific traffic to all pods in this namespace)
  Allowing ingress traffic:
    <none> (Selected pods are isolated for ingress connectivity)
  Allowing egress traffic:
    <none> (Selected pods are isolated for egress connectivity)
  Policy Types: Ingress, Egress
dinesh@Dineshs-MacBook-Pro ~ % 
  1. Cluster version operator should be healthy
dinesh@Dineshs-MacBook-Pro ~ % oc get -o json clusterversion version | jq -r '.status.conditions[] | .type + "=" + .status + " " ' 
RetrievedUpdates=False 
ImplicitlyEnabledCapabilities=False 
ReleaseAccepted=True 
Available=True 
Failing=False 
Progressing=False 
dinesh@Dineshs-MacBook-Pro ~ % 
  1. No error logs related to connection refused, i/o timeout and no route to host
dinesh@Dineshs-MacBook-Pro ~ % oc logs -n openshift-cluster-version $(oc get pods -n openshift-cluster-version -o jsonpath='{.items[0].metadata.name}') | grep -iE "connection refused|i/o timeout|no route to host"
dinesh@Dineshs-MacBook-Pro ~ %

@dis016
Copy link

dis016 commented Jul 3, 2025

Test Scenario: OSUS can connect to production/external cincinnati and fetch the upgrade graph based on the version in connected cluster environment.

Note: for pre-merge testing we are using https://fauxinnati-fauxinnati.apps.ota-stage.q2z4.p1.openshiftapps.com/ to demonstrate the production.

  1. Install a connected cluster
dinesh@Dineshs-MacBook-Pro ~ % oc get clusterversion 
NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.20.0-0-2025-07-03-024645-test-ci-ln-3p0b0sk-latest   True        False         18m     Cluster version is 4.20.0-0-2025-07-03-024645-test-ci-ln-3p0b0sk-latest
dinesh@Dineshs-MacBook-Pro ~ % 
  1. Patch external/production upstream graph
dinesh@Dineshs-MacBook-Pro ~ % oc patch clusterversion version --type=merge -p '{"spec":{"upstream":"https://fauxinnati-fauxinnati.apps.ota-stage.q2z4.p1.openshiftapps.com/api/upgrades_info/graph"}}'
clusterversion.config.openshift.io/version patched
dinesh@Dineshs-MacBook-Pro ~ %

Upstream should have proper value

dinesh@Dineshs-MacBook-Pro ~ % oc get clusterversion version -o jsonpath='{.spec.upstream}'
https://fauxinnati-fauxinnati.apps.ota-stage.q2z4.p1.openshiftapps.com/api/upgrades_info/graph%                                                                                                             dinesh@Dineshs-MacBook-Pro ~ % 
  1. set channel to sample to demonstrate with fauxinnati
 dinesh@Dineshs-MacBook-Pro ~ % oc adm upgrade channel simple
warning: No channels known to be compatible with the current version "4.20.0-0-2025-07-03-024645-test-ci-ln-3p0b0sk-latest"; unable to validate "simple". Setting the update channel to "simple" anyway.
dinesh@Dineshs-MacBook-Pro ~ %
  1. Upgrade Path from external/production OSUS should be available oc adm upgrade

dinesh@Dineshs-MacBook-Pro ~ % oc adm upgrade 
Cluster version is 4.20.0-0-2025-07-03-024645-test-ci-ln-3p0b0sk-latest

Upstream: https://fauxinnati-fauxinnati.apps.ota-stage.q2z4.p1.openshiftapps.com/api/upgrades_info/graph
Channel: simple

Recommended updates:

  VERSION     IMAGE
  4.21.0      quay.io/openshift-release-dev/ocp-release@sha256:00000000000000000000000000000000000000000000000000000000003d5b08
  4.20.1      quay.io/openshift-release-dev/ocp-release@sha256:00000000000000000000000000000000000000000000000000000000003d5721
dinesh@Dineshs-MacBook-Pro ~ % 

@dis016
Copy link

dis016 commented Jul 3, 2025

Test Scenario: CVO able to evaluate conditional Risk's for upgrade path.

  1. Install a Cluster with cvo policy.
dinesh@Dineshs-MacBook-Pro ~ % oc get clusterversion 
NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.20.0-0-2025-07-03-024645-test-ci-ln-3p0b0sk-latest   True        False         18m     Cluster version is 4.20.0-0-2025-07-03-024645-test-ci-ln-3p0b0sk-latest
dinesh@Dineshs-MacBook-Pro ~ % 
  1. Patch custom graph from github with invalid prompql like below
    https://raw.githubusercontent.com/dis016/upgrade-cincy/refs/heads/master/cincy-conditional-edge-invalid-promql.json
{
  "nodes": [
    {
      "version": "4.20.0-0-2025-07-03-024645-test-ci-ln-3p0b0sk-latest",
      "payload": "registry.build07.ci.openshift.org/ci-ln-3p0b0sk/release@sha256:03705561230fc8a5bac03b86266ad62c27c70971ec869006139241ac3ff721b0"
    },
    {
      "version": "4.20.1",
      "payload": "quay.io/openshift-release-dev/ocp-release@sha256:00000000000000000000000000000000000000000000000000000000003d5721"
    }
  ],
  "conditionalEdges":[
    {
      "edges": [
        {"from": "4.20.0-0-2025-07-03-024645-test-ci-ln-3p0b0sk-latest", "to": "4.20.1"}
      ],
      "risks": [
        {
          "url": "https://invalid.com/a",
          "name": "InvalidPromQL",
          "message": "Invalid Promql",
          "matchingRules": [
            {
              "type": "PromQL",
              "promql": {
                "promql": "0 * group1(cluster_version)"
              }
            }
          ]
        }
      ]
    }
  ]
}

Patch the upstream

dinesh@Dineshs-MacBook-Pro ~ % oc patch clusterversion version --type=merge -p '{"spec":{"upstream":"https://raw.githubusercontent.com/dis016/upgrade-cincy/refs/heads/master/cincy-conditional-edge-invalid-promql.json"}}'
clusterversion.config.openshift.io/version patched
dinesh@Dineshs-MacBook-Pro ~ % 
  1. CVO should report the invalid prompql error with conditional update path
dinesh@Dineshs-MacBook-Pro ~ % oc adm upgrade --include-not-recommended
Cluster version is 4.20.0-0-2025-07-03-024645-test-ci-ln-3p0b0sk-latest

Upstream: https://raw.githubusercontent.com/dis016/upgrade-cincy/refs/heads/master/cincy-conditional-edge-invalid-promql.json
Channel: simple
No updates available. You may still upgrade to a specific release image with --to-image or wait for new updates to be available.

Updates with known issues:

  Version: 4.20.1
  Image: quay.io/openshift-release-dev/ocp-release@sha256:00000000000000000000000000000000000000000000000000000000003d5721
  Reason: EvaluationFailed
  Message: Could not evaluate exposure to update risk InvalidPromQL (executing PromQL query: bad_data: 1:5: parse error: unknown function with name "group1")
    InvalidPromQL description: Invalid Promql
    InvalidPromQL URL: https://invalid.com/a
dinesh@Dineshs-MacBook-Pro ~ % 

@dis016
Copy link

dis016 commented Jul 3, 2025

Test Scenario: CVO should be able to verify signatures for upgrade in the connected cluster environment

  1. Install a connected cluster with cvo network policy.
dinesh@Dineshs-MacBook-Pro ~ % oc get clusterversion 
NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.20.0-0-2025-07-03-024645-test-ci-ln-3p0b0sk-latest   True        False         18m     Cluster version is 4.20.0-0-2025-07-03-024645-test-ci-ln-3p0b0sk-latest
dinesh@Dineshs-MacBook-Pro ~ % 
  1. trigger upgrade to unsigned target without --force
dinesh@Dineshs-MacBook-Pro ~ % oc image info registry.ci.openshift.org/ocp/release:4.20.0-0.nightly-2025-07-01-051543  | grep "Digest" 
Digest:      sha256:ecff069ad9e1d1e72ac334a5b5ade7ff8e68a2a3aa2bf7e56cd4b914e681dd6f
dinesh@Dineshs-MacBook-Pro ~ % 


dinesh@Dineshs-MacBook-Pro ~ % oc adm upgrade --allow-explicit-upgrade --to-image registry.ci.openshift.org/ocp/release@sha256:ecff069ad9e1d1e72ac334a5b5ade7ff8e68a2a3aa2bf7e56cd4b914e681dd6f 
warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade for the update to proceed anyway
Requested update to release image registry.ci.openshift.org/ocp/release@sha256:ecff069ad9e1d1e72ac334a5b5ade7ff8e68a2a3aa2bf7e56cd4b914e681dd6f
dinesh@Dineshs-MacBook-Pro ~ %
  1. Upgrade should not be started and error for signature verification
dinesh@Dineshs-MacBook-Pro ~ % oc adm upgrade 
Cluster version is 4.20.0-0-2025-07-03-024645-test-ci-ln-3p0b0sk-latest

ReleaseAccepted=False

  Reason: RetrievePayload
  Message: Retrieving payload failed version="" image="registry.ci.openshift.org/ocp/release@sha256:ecff069ad9e1d1e72ac334a5b5ade7ff8e68a2a3aa2bf7e56cd4b914e681dd6f" failure=The update cannot be verified: unable to verify sha256:ecff069ad9e1d1e72ac334a5b5ade7ff8e68a2a3aa2bf7e56cd4b914e681dd6f against keyrings: verifier-public-key-redhat

warning: Cannot display available updates:
  Reason: NoChannel
  Message: The update channel has not been configured.

dinesh@Dineshs-MacBook-Pro ~ % oc get clusterversion 
NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.20.0-0-2025-07-03-024645-test-ci-ln-3p0b0sk-latest   True        False         49m     Cluster version is 4.20.0-0-2025-07-03-024645-test-ci-ln-3p0b0sk-latest
dinesh@Dineshs-MacBook-Pro ~ % 
  1. clear the upgrade
dinesh@Dineshs-MacBook-Pro ~ % oc adm upgrade --clear 
Cancelled requested upgrade to registry.ci.openshift.org/ocp/release@sha256:ecff069ad9e1d1e72ac334a5b5ade7ff8e68a2a3aa2bf7e56cd4b914e681dd6f
dinesh@Dineshs-MacBook-Pro ~ % 
  1. upgrade to signed target without --force
dinesh@Dineshs-MacBook-Pro ~ % oc image info quay.io/openshift-release-dev/ocp-release:4.20.0-ec.3-x86_64  | grep "Digest" 
Digest:      sha256:4dfd7223e883a685c7be0906b09d573ef24bdb8f7fcfb1876e198bed5352ba55
dinesh@Dineshs-MacBook-Pro ~ %

dinesh@Dineshs-MacBook-Pro ~ % oc adm upgrade --allow-explicit-upgrade --to-image quay.io/openshift-release-dev/ocp-release@sha256:4dfd7223e883a685c7be0906b09d573ef24bdb8f7fcfb1876e198bed5352ba55  
warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade for the update to proceed anyway
Requested update to release image quay.io/openshift-release-dev/ocp-release@sha256:4dfd7223e883a685c7be0906b09d573ef24bdb8f7fcfb1876e198bed5352ba55
dinesh@Dineshs-MacBook-Pro ~ %
  1. Upgrade should be started and In-progress
dinesh@Dineshs-MacBook-Pro ~ % oc adm upgrade 
info: An upgrade is in progress. Working towards 4.20.0-ec.3: 69 of 935 done (7% complete), waiting on config-operator

Upgradeable=False

  Reason: UpdateInProgress
  Message: An update is already in progress and the details are in the Progressing condition

warning: Cannot display available updates:
  Reason: NoChannel
  Message: The update channel has not been configured.

dinesh@Dineshs-MacBook-Pro ~ % oc get clusterversion 
NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.20.0-0-2025-07-03-024645-test-ci-ln-3p0b0sk-latest   True        True          24s     Working towards 4.20.0-ec.3: 113 of 935 done (12% complete), waiting on etcd, kube-apiserver
dinesh@Dineshs-MacBook-Pro ~ %

@dis016
Copy link

dis016 commented Jul 3, 2025

Testing results look good for now, nothing looks suspicious. I'd like to test the behaviour in disconnected environment in the following days, if that looks good I'll update here and this is good to go

@hongkailiu
Copy link
Member

hongkailiu commented Jul 3, 2025

/lgtm

The pull itself looks good to me.
I am still checking the testing cases in the working doc and what Dinesh did above.
I will bring my questions if I find something.


Update.
The cases look sufficient to me.
Made a summary here.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 3, 2025
Copy link
Contributor

openshift-ci bot commented Jul 3, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hongkailiu, petr-muller

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [hongkailiu,petr-muller]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dis016
Copy link

dis016 commented Jul 4, 2025

no need to test specifically in disconnected environment. as external network is blocked but similar service is available in intranet. that doesn't make any difference to connected/disconnected clusters

@dis016
Copy link

dis016 commented Jul 4, 2025

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Jul 4, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 4, 2025

@petr-muller: This pull request references OTA-1521 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

Add a baseline NetworkPolicy to deny all network communication (both ingress and egress) to all pods in the namespace. Any necessary network traffic needs to be allowed by an additional NetworkPolicy resource (they are additive).

At the moment, the default deny all policy should be the only one needed:

See OTA Network Policies Working Document for more information.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 2d837d9 and 2 for PR HEAD 07c6b5c in total

1 similar comment
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 2d837d9 and 2 for PR HEAD 07c6b5c in total

Copy link
Contributor

openshift-ci bot commented Jul 4, 2025

@petr-muller: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 0947d3e into openshift:main Jul 4, 2025
16 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: cluster-version-operator
This PR has been included in build cluster-version-operator-container-v4.20.0-202507041544.p0.g0947d3e.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants