Skip to content

AGENT-997: Internal dev docs for authentication-authorization #9224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
163 changes: 163 additions & 0 deletions docs/user/agent/authentication-authorization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# Agent-Based Installer Authentication and Authorization - OpenShift 4.18

## Introduction
Since OpenShift 4.18, the Assisted Service (AS) APIs used by the Agent-Based Installer (ABI) have been secured through authentication and authorization mechanisms. This added security ensures that only authorized services can interact with AS. This document provides an overview of how the authentication and authorization mechanisms were implemented for ABI.

## Enabling Authentication and Authorization

To enable authentication and authorization, the `AuthType` setting must be configured in the `assisted-service.env.template` file. Set the following value:

```bash
AuthType = "agent-installer-local"
```

This setting ensures that the authentication mechanism required by the agent-based installer is activated.

## Token Workflow: Day 1
```mermaid
flowchart TD
A[Create Agent ISO] --> B[Generate 3 tokens: agentAuth, userAuth, watcherAuth. Also generate public key]
B --> C[Store tokens and base64 encoded public key in asset store hidden file '.openshift_state_install.json']
B --> D[Embed tokens and base64 encoded public key in /usr/local/share/assisted-service/assisted-service.env file in ISO]
D --> E[Boot hosts using ISO]
E --> F[Assisted service running on rendezvous host validates the token from HTTP requests coming from each node]
F --> G{Are tokens valid?}
G -->|Yes| H[Allow successful cluster installation]
G -->|No| I[Fail cluster installation]
```

## Tokens for Agent-Based Installer

When creating an Agent ISO (or PXE artifacts) for installing a new cluster, ABI generates three separate authentication tokens: `agentAuth`, `userAuth`, and `watcherAuth`. These tokens serve different purposes and have varying levels of access. The tokens are the part of the ignition that gets embeded in the CoreOS base ISO along with other ABI specific configurations for a cluster. The tokens are stored in `/usr/local/share/assisted-service/assisted-service.env` in the ISO. The base64 encoded public key required to validate these tokens is also saved in the same file. Additionally, this data also gets stored in a hidden file `.openshift_state_install.json` present in the directory location mentioned when creating ISO.

### Types of Tokens:
1. **agentAuth**:
- **Access Level**: Read-Write (RW)
- **Used By**: Agent services running in the live environment when host is booted.

2. **userAuth**:
- **Access Level**: Read-Write (RW)
- **Used By**: Systemd services (such as agent-register-infraenv.service, agent-register-cluster.service, etc) and the agent-installer-client in the Assisted Service.

3. **watcherAuth**:
- **Access Level**: Read-Only (RO)
- **Used By**: The monitoring commands (the `wait-for` command on Day 1 and `monitor-add-nodes` command on Day 2).

### Managing Access with Tokens and Security Schemes in swagger.yaml
Access control in swagger.yaml is managed through a combination of tokens and endpoint annotations. Each token corresponds to a specific security scheme and dictates the level of access based on the type of operation and annotation.

- agentAuth Token and Scheme:

A `GET` endpoint annotated with the `agentAuth` scheme grants `read-only (RO)` access.
A `POST` endpoint annotated with the same scheme enables `read-write (RW)` access.

- userAuth Token and Scheme:

Provides read-write (RW) access for both `GET` and `POST` endpoints.
- watcherAuth Token and Scheme:

Grants `read-only (RO)` access, applicable to `GET` endpoints only.

By combining the appropriate token and security annotations, `swagger.yaml` effectively controls access for various user personas.

### Token Expiry Behavior
- **Day 1** Tokens: The tokens generated on Day 1 never expire, meaning the agent ISO remains valid indefinitely. This is useful for use cases where customers create an agent ISO on one day and boot it on a later day (e.g., in a data center).
- **Day 2** Tokens: Tokens generated for nodes ISO on Day 2 expire after 48 hours. This ensures that the ISO is only valid for a short period.

### Day 2 token management
When a Day 2 node ISO is created, all three tokens are saved in the `openshift-config` namespace as a new secret named `agent-auth-token`. The following steps outline how token management works:
1. If the `openshift-config/agent-auth-token` secret does not exist in the cluster, the secret is created with the 3 tokens mentioned above along with the public key.
2. If the `openshift-config/agent-auth-token` secret already exists in the cluster, the tokens are retrieved from the secret (instead of being generated).
3. All tokens expire at the same time, so only one token needs to be checked for expiry.
4. If the token in the secret is older than 24 hours when creating a node ISO, the installer generates new tokens. These new tokens are saved in the asset store, and the secret in the cluster is updated with the new token values.
5. If the token in the secret is not older than 24 hours, the installer retrieves the existing token from the cluster, updates the asset store with the retrieved token, and uses the retrieved token for the new node ISO.
6. When a node is booted using the Day 2 `nodes.iso` to add worker nodes, the `agent-auth-token-status.service` systemd service will check if the authentication tokens have expired. If the tokens are expired, the following message will be displayed:

```
The authentication token has expired. Please generate a new ISO using the "oc adm node-image create" command, then reboot the node.
```

This ensures that the user is notified of expired tokens and can take appropriate action to generate a new ISO.

## Token Workflow: Day 2
```mermaid
flowchart TD
A[Check if openshift-config/agent-auth-token exists] -->|No| B[Create secret with 3 tokens and public key]
A -->|Yes| C[Retrieve tokens from existing secret]
B --> D[Generate new tokens if secret does not exist]
C --> D[Check if token is older than 24 hours]
D -->|Yes| E[Generate new tokens, update asset store, and secret]
D -->|No| F[Retrieve token, update asset store, use token for ISO]
F --> G[Node boots and checks token expiry]
G -->|Expired| H[Display message: 'Authentication token expired. Generate a new ISO.']
G -->|Not expired| I[Proceed with installation using retrieved token]
```

## Authorization

To implement authorization for ABI, a new security definition is added in the `swagger.yaml` file in AS. The new security definition is as follows:

```yaml
watcherAuth:
type: apiKey
in: header
name: Watcher-Authorization
```

The required endpoints are annotated with this security definition. For example:

```yaml
/v2/clusters:
get:
tags:
- installer
security:
- userAuth: [admin, read-only-admin, user]
- watcherAuth: []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For watcherAuth, what does an empty array ([]) mean? Shouldn't there be something like read-only listed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty array under watcherAuth indicates that no specific roles are required—this is intentional and specific to the ABI flow. Unlike the roles mentioned for userAuth, which is tied to RHSSO auth type and uses IAM roles like admin, user, etc., ABI doesn’t use IAM policies. Instead, access control is based on the endpoint behavior itself.

For example, the watcherAuth persona (like the wait-for user) is only allowed to perform safe read-only actions like GET, while userAuth might have both GET and POST access, implying read-write. So the roles aren't declared because ABI evaluates access by persona and method, not by static role mapping.

description: Retrieves the list of OpenShift clusters.
operationId: v2ListClusters
```

### Authorization Handler

The Assisted Service API includes an authorization handler implemented in the `pkg/auth/agent_local_authz_handler.go` file. The `AgentLocalAuthzHandler` is responsible for verifying that the `auth_scheme` in the token matches the required `authScheme` for the endpoint. The authorization logic works as follows:

1. The `auth_scheme` claim in the JWT token is extracted.
2. If the claim is malformed or missing, an error is returned.
3. The `auth_scheme` claim embedded in the token specifies the type of authorization (e.g., `watcherAuth`) and is compared with the `authScheme` expected by the endpoint. This ensures that the token matches the intended purpose of the request.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is /v2/clusters above the only endpoint that supports watcherAuth?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4. If the `auth_scheme` in the token does not match the `authScheme` required by the endpoint, or if an incorrect token header is provided, the request is rejected with a `Forbidden (403)` error.

Code implementation https://github.com/openshift/assisted-service/blame/973249d13ee9c83fd8a620a94956bcd16927b352/pkg/auth/agent_local_authz_handler.go#L46-L80

For example:
- Suppose the endpoint `/api/assisted-install/v2/infra-envs` requires the `watcherAuth` scheme.
- A request is made with the following command, but an **incorrect token header** is sent:
```bash
curl http://localhost:8090/api/assisted-install/v2/infra-envs \
-H "Authorization: eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdXRoX3NjaGVtZSI6IndhdGNoZXJBdXRoIn0.hJ5EuBOttDQ1-ANWUJlc97H1k63d-clDCaFUZ84XSYdF03A938OfMU6RHWltv0AiWZEHNJ0gLrjLxR-1yUYqiA"
```
- Although the token specifies the correct `auth_scheme` (`watcherAuth`), the **header sent is incorrect** for this request.

Expected response:
```json
{
"code": 403,
"message": "authClaim watcherAuth is unauthorized to access"
}
```

**Outcome:**
The system identifies that the provided header does not satisfy the endpoint’s requirements and correctly responds with an error. This confirms that tokens must not only align with the expected `authScheme` but also be passed in the correct format and header. The authorization logic ensures that only authorized requests can access protected endpoints in Assisted Service.

## Authentication API (Server Side)

The authentication server validates the token provided by the client using the `validateToken` function. The following steps explain the server-side validation process:

1. When an agent or nodes ISO is created, a set of public and private keys is generated. The private key is used to sign the JWT tokens, while the associated public key is made available to the API server.
2. The token is passed to the API server as part of the request. The API server validates the token by verifying its signature against the public key. This ensures the token was signed by the correct private key and has not been tampered with.
3. If the token is invalid or expired, an error is logged and the request is rejected with a `Unauthorized` error.
4. The claims in the token are extracted.
5. If the `exp` (expiration) claim is found, the server checks whether the token has expired.
- In the install workflow, the `exp` claim will not be present.
- In the add-nodes workflow, the token expiration is checked. If expired, a new token must be generated.
6. If the token is valid and not expired, the claims are returned and the request proceeds.