Skip to content

Commit ba4e86f

Browse files
authored
🐙 octavia-cli: secret management (#10885)
1 parent ad20f00 commit ba4e86f

File tree

13 files changed

+141
-64
lines changed

13 files changed

+141
-64
lines changed

octavia-cli/README.md

+12-1
Original file line numberDiff line numberDiff line change
@@ -34,11 +34,22 @@ Octavia is currently under development.
3434
You can find a detailed and updated execution plan [here](https://docs.google.com/spreadsheets/d/1weB9nf0Zx3IR_QvpkxtjBAzyfGb7B0PWpsVt6iMB5Us/edit#gid=0).
3535
We welcome community contributions!
3636

37+
# Secret management
38+
Sources and destinations configurations have credential fields that you **do not want to store as plain text and version on Git**.
39+
`octavia` offers secret management through environment variables expansion:
40+
```yaml
41+
configuration:
42+
password: ${MY_PASSWORD}
43+
```
44+
If you have set a `MY_PASSWORD` environment variable, `octavia apply` will load its value into the `password` field.
45+
3746
**Summary of achievements**:
3847

3948
| Date | Milestone |
4049
|------------|-------------------------------------|
41-
| 2022-03-04 | Implement `octavia apply` for connections|
50+
| 2022-03-09 | Implement secret management through environment variable expansion |
51+
| 2022-03-09 | Implement `octavia generate connection`|
52+
| 2022-03-09 | Implement `octavia apply` for connections|
4253
| 2022-03-02 | Implement `octavia apply` (sources and destination only)|
4354
| 2022-02-06 | Implement `octavia generate` (sources and destination only)|
4455
| 2022-01-25 | Implement `octavia init` + some context checks|

octavia-cli/integration_tests/test_generate/expected_rendered_yaml/destination_postgres/expected.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ configuration:
1313
database: # REQUIRED | string | Name of the database.
1414
schema: "public" # REQUIRED | string | The default schema tables are written to if the source does not specify a namespace. The usual value for this field is "public". | Example: public
1515
username: # REQUIRED | string | Username to use to access the database.
16-
password: # SECRET | OPTIONAL | string | Password associated with the username.
16+
password: ${PASSWORD} # SECRET (please store in environment variables) | OPTIONAL | string | Password associated with the username.
1717
ssl: # OPTIONAL | boolean | Encrypt data using SSL.
1818
tunnel_method:
1919
## -------- Pick one valid structure among the examples below: --------
@@ -23,10 +23,10 @@ configuration:
2323
# tunnel_host: # REQUIRED | string | Hostname of the jump server host that allows inbound ssh tunnel.
2424
# tunnel_port: 22 # REQUIRED | integer | Port on the proxy/jump server that accepts inbound ssh connections. | Example: 22
2525
# tunnel_user: # REQUIRED | string | OS-level username for logging into the jump server host.
26-
# ssh_key: # SECRET | REQUIRED | string | OS-level user account ssh key credentials in RSA PEM format ( created with ssh-keygen -t rsa -m PEM -f myuser_rsa )
26+
# ssh_key: ${SSH_KEY} # SECRET (please store in environment variables) | REQUIRED | string | OS-level user account ssh key credentials in RSA PEM format ( created with ssh-keygen -t rsa -m PEM -f myuser_rsa )
2727
## -------- Another valid structure for tunnel_method: --------
2828
# tunnel_method: "SSH_PASSWORD_AUTH" # REQUIRED | string | Connect through a jump server tunnel host using username and password authentication
2929
# tunnel_host: # REQUIRED | string | Hostname of the jump server host that allows inbound ssh tunnel.
3030
# tunnel_port: 22 # REQUIRED | integer | Port on the proxy/jump server that accepts inbound ssh connections. | Example: 22
3131
# tunnel_user: # REQUIRED | string | OS-level username for logging into the jump server host
32-
# tunnel_user_password: # SECRET | REQUIRED | string | OS-level password for logging into the jump server host
32+
# tunnel_user_password: ${TUNNEL_USER_PASSWORD} # SECRET (please store in environment variables) | REQUIRED | string | OS-level password for logging into the jump server host

octavia-cli/integration_tests/test_generate/expected_rendered_yaml/destination_s3/expected.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@ configuration:
1212
s3_bucket_name: # REQUIRED | string | The name of the S3 bucket. | Example: airbyte_sync
1313
s3_bucket_path: # REQUIRED | string | Directory under the S3 bucket where data will be written. | Example: data_sync/test
1414
s3_bucket_region: # REQUIRED | string | The region of the S3 bucket.
15-
access_key_id: # SECRET | OPTIONAL | string | The access key id to access the S3 bucket. Airbyte requires Read and Write permissions to the given bucket, if not set, Airbyte will rely on Instance Profile. | Example: A012345678910EXAMPLE
16-
secret_access_key: # SECRET | OPTIONAL | string | The corresponding secret to the access key id, if S3 Key Id is set, then S3 Access Key must also be provided | Example: a012345678910ABCDEFGH/AbCdEfGhEXAMPLEKEY
15+
access_key_id: ${ACCESS_KEY_ID} # SECRET (please store in environment variables) | OPTIONAL | string | The access key id to access the S3 bucket. Airbyte requires Read and Write permissions to the given bucket, if not set, Airbyte will rely on Instance Profile. | Example: A012345678910EXAMPLE
16+
secret_access_key: ${SECRET_ACCESS_KEY} # SECRET (please store in environment variables) | OPTIONAL | string | The corresponding secret to the access key id, if S3 Key Id is set, then S3 Access Key must also be provided | Example: a012345678910ABCDEFGH/AbCdEfGhEXAMPLEKEY
1717
format:
1818
## -------- Pick one valid structure among the examples below: --------
1919
format_type: "Avro" # REQUIRED | string}

octavia-cli/integration_tests/test_generate/expected_rendered_yaml/source_postgres/expected.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ configuration:
1313
database: # REQUIRED | string | Name of the database.
1414
schemas: ["public"] # OPTIONAL | array | The list of schemas to sync from. Defaults to user. Case sensitive.
1515
username: # REQUIRED | string | Username to use to access the database.
16-
password: # SECRET | OPTIONAL | string | Password associated with the username.
16+
password: ${PASSWORD} # SECRET (please store in environment variables) | OPTIONAL | string | Password associated with the username.
1717
ssl: # OPTIONAL | boolean | Encrypt client/server communications for increased security.
1818
replication_method:
1919
## -------- Pick one valid structure among the examples below: --------
@@ -31,10 +31,10 @@ configuration:
3131
# tunnel_host: # REQUIRED | string | Hostname of the jump server host that allows inbound ssh tunnel.
3232
# tunnel_port: 22 # REQUIRED | integer | Port on the proxy/jump server that accepts inbound ssh connections. | Example: 22
3333
# tunnel_user: # REQUIRED | string | OS-level username for logging into the jump server host.
34-
# ssh_key: # SECRET | REQUIRED | string | OS-level user account ssh key credentials in RSA PEM format ( created with ssh-keygen -t rsa -m PEM -f myuser_rsa )
34+
# ssh_key: ${SSH_KEY} # SECRET (please store in environment variables) | REQUIRED | string | OS-level user account ssh key credentials in RSA PEM format ( created with ssh-keygen -t rsa -m PEM -f myuser_rsa )
3535
## -------- Another valid structure for tunnel_method: --------
3636
# tunnel_method: "SSH_PASSWORD_AUTH" # REQUIRED | string | Connect through a jump server tunnel host using username and password authentication
3737
# tunnel_host: # REQUIRED | string | Hostname of the jump server host that allows inbound ssh tunnel.
3838
# tunnel_port: 22 # REQUIRED | integer | Port on the proxy/jump server that accepts inbound ssh connections. | Example: 22
3939
# tunnel_user: # REQUIRED | string | OS-level username for logging into the jump server host
40-
# tunnel_user_password: # SECRET | REQUIRED | string | OS-level password for logging into the jump server host
40+
# tunnel_user_password: ${TUNNEL_USER_PASSWORD} # SECRET (please store in environment variables) | REQUIRED | string | OS-level password for logging into the jump server host

octavia-cli/octavia_cli/apply/diff_helpers.py

+7-12
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
#
44

55
import hashlib
6+
import json
67
from typing import Any
78

89
import click
@@ -11,23 +12,17 @@
1112
SECRET_MASK = "**********"
1213

1314

14-
def compute_checksum(file_path: str) -> str:
15-
"""Compute SHA256 checksum from a file
15+
def hash_config(configuration: dict) -> str:
16+
"""Computes a SHA256 hash from a dictionnary.
1617
1718
Args:
18-
file_path (str): Path for the file for which you want to compute a checksum.
19+
configuration (dict): The configuration to hash
1920
2021
Returns:
21-
str: The computed hash digest
22+
str: _description_
2223
"""
23-
BLOCK_SIZE = 65536
24-
file_hash = hashlib.sha256()
25-
with open(file_path, "rb") as f:
26-
fb = f.read(BLOCK_SIZE)
27-
while len(fb) > 0:
28-
file_hash.update(fb)
29-
fb = f.read(BLOCK_SIZE)
30-
return file_hash.hexdigest()
24+
stringified = json.dumps(configuration, sort_keys=True)
25+
return hashlib.sha256(stringified.encode("utf-8")).hexdigest()
3126

3227

3328
def exclude_secrets_from_diff(obj: Any, path: str) -> bool:

octavia-cli/octavia_cli/apply/resources.py

+16-16
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,8 @@
3232
from airbyte_api_client.model.source_update import SourceUpdate
3333
from click import ClickException
3434

35-
from .diff_helpers import compute_checksum, compute_diff
35+
from .diff_helpers import compute_diff, hash_config
36+
from .yaml_loaders import EnvVarLoader
3637

3738

3839
class DuplicateResourceError(ClickException):
@@ -48,27 +49,27 @@ class InvalidConfigurationError(ClickException):
4849

4950

5051
class ResourceState:
51-
def __init__(self, configuration_path: str, resource_id: str, generation_timestamp: int, configuration_checksum: str):
52+
def __init__(self, configuration_path: str, resource_id: str, generation_timestamp: int, configuration_hash: str):
5253
"""This constructor is meant to be private. Construction shall be made with create or from_file class methods.
5354
5455
Args:
55-
configuration_path (str): Path to the configuration path the state relates to.
56+
configuration_path (str): Path to the configuration this state relates to.
5657
resource_id (str): Id of the resource the state relates to.
5758
generation_timestamp (int): State generation timestamp.
58-
configuration_checksum (str): Checksum of the configuration file.
59+
configuration_hash (str): Checksum of the configuration file.
5960
"""
6061
self.configuration_path = configuration_path
6162
self.resource_id = resource_id
6263
self.generation_timestamp = generation_timestamp
63-
self.configuration_checksum = configuration_checksum
64+
self.configuration_hash = configuration_hash
6465
self.path = os.path.join(os.path.dirname(self.configuration_path), "state.yaml")
6566

6667
def as_dict(self):
6768
return {
68-
"configuration_path": self.configuration_path,
6969
"resource_id": self.resource_id,
7070
"generation_timestamp": self.generation_timestamp,
71-
"configuration_checksum": self.configuration_checksum,
71+
"configuration_path": self.configuration_path,
72+
"configuration_hash": self.configuration_hash,
7273
}
7374

7475
def _save(self) -> None:
@@ -77,19 +78,20 @@ def _save(self) -> None:
7778
yaml.dump(self.as_dict(), state_file)
7879

7980
@classmethod
80-
def create(cls, configuration_path: str, resource_id: str) -> "ResourceState":
81+
def create(cls, configuration_path: str, configuration: dict, resource_id: str) -> "ResourceState":
8182
"""Create a state for a resource configuration.
8283
8384
Args:
8485
configuration_path (str): Path to the YAML file defining the resource.
86+
configuration (dict): Configuration object that will be hashed.
8587
resource_id (str): UUID of the resource.
8688
8789
Returns:
8890
ResourceState: state representing the resource.
8991
"""
9092
generation_timestamp = int(time.time())
91-
configuration_checksum = compute_checksum(configuration_path)
92-
state = ResourceState(configuration_path, resource_id, generation_timestamp, configuration_checksum)
93+
configuration_hash = hash_config(configuration)
94+
state = ResourceState(configuration_path, resource_id, generation_timestamp, configuration_hash)
9395
state._save()
9496
return state
9597

@@ -109,7 +111,7 @@ def from_file(cls, file_path: str) -> "ResourceState":
109111
raw_state["configuration_path"],
110112
raw_state["resource_id"],
111113
raw_state["generation_timestamp"],
112-
raw_state["configuration_checksum"],
114+
raw_state["configuration_hash"],
113115
)
114116

115117

@@ -198,9 +200,7 @@ def __init__(
198200
self.configuration_path = configuration_path
199201
self.api_instance = self.api(api_client)
200202
self.state = self._get_state_from_file()
201-
self.local_file_changed = (
202-
True if self.state is None else compute_checksum(self.configuration_path) != self.state.configuration_checksum
203-
)
203+
self.local_file_changed = True if self.state is None else hash_config(self.local_configuration) != self.state.configuration_hash
204204

205205
@property
206206
def remote_resource(self):
@@ -308,7 +308,7 @@ def _create_or_update(
308308
"""
309309
try:
310310
result = operation_fn(self.api_instance, payload, _check_return_type=_check_return_type)
311-
return result, ResourceState.create(self.configuration_path, result[self.resource_id_field])
311+
return result, ResourceState.create(self.configuration_path, self.local_configuration, result[self.resource_id_field])
312312
except airbyte_api_client.ApiException as api_error:
313313
if api_error.status == 422:
314314
# This API response error is really verbose, but it embodies all the details about why the config is not valid.
@@ -554,7 +554,7 @@ def factory(api_client: airbyte_api_client.ApiClient, workspace_id: str, configu
554554
Union[Source, Destination, Connection]: The resource object created from the YAML config.
555555
"""
556556
with open(configuration_path, "r") as f:
557-
local_configuration = yaml.safe_load(f)
557+
local_configuration = yaml.load(f, EnvVarLoader)
558558
if local_configuration["definition_type"] == "source":
559559
return Source(api_client, workspace_id, local_configuration, configuration_path)
560560
if local_configuration["definition_type"] == "destination":
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
#
2+
# Copyright (c) 2021 Airbyte, Inc., all rights reserved.
3+
#
4+
5+
import os
6+
import re
7+
from typing import Any
8+
9+
import yaml
10+
11+
ENV_VAR_MATCHER_PATTERN = re.compile(r".*\$\{([^}^{]+)\}.*")
12+
13+
14+
def env_var_replacer(loader: yaml.Loader, node: yaml.Node) -> Any:
15+
"""Convert a YAML node to a Python object, expanding variable.
16+
17+
Args:
18+
loader (yaml.Loader): Not used
19+
node (yaml.Node): Yaml node to convert to python object
20+
21+
Returns:
22+
Any: Python object with expanded vars.
23+
"""
24+
return os.path.expandvars(node.value)
25+
26+
27+
class EnvVarLoader(yaml.SafeLoader):
28+
pass
29+
30+
31+
# All yaml nodes matching the regex will be tagged as !environment_variable.
32+
EnvVarLoader.add_implicit_resolver("!environment_variable", ENV_VAR_MATCHER_PATTERN, None)
33+
34+
# All yaml nodes tagged as !environment_variable will be constructed with the env_var_replacer callback.
35+
EnvVarLoader.add_constructor("!environment_variable", env_var_replacer)

octavia-cli/octavia_cli/generate/renderers.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ def _get_type_comment(self) -> str:
9393
return self.type if self.type else None
9494

9595
def _get_secret_comment(self) -> str:
96-
return "SECRET" if self.airbyte_secret else None
96+
return "SECRET (please store in environment variables)" if self.airbyte_secret else None
9797

9898
def _get_description_comment(self) -> str:
9999
return self.description if self.description else None
@@ -113,6 +113,8 @@ def _get_example_comment(self) -> str:
113113
def _get_default(self) -> str:
114114
if self.const:
115115
return self.const
116+
if self.airbyte_secret:
117+
return f"${{{self.name.upper()}}}"
116118
return self.default
117119

118120
@staticmethod

octavia-cli/octavia_cli/generate/templates/source_or_destination.yaml.j2

+1-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ definition_image: {{ definition.docker_repository }}
77
definition_version: {{ definition.docker_image_tag }}
88

99
{%- macro render_field(field, is_commented) %}
10-
{%- if is_commented %}# {% endif %}{{ field.name }}:{% if field.default %} {{ field.default | tojson() }}{% endif %} # {{ field.comment }}
10+
{%- if is_commented %}# {% endif %}{{ field.name }}:{% if field.default %} {% if field.airbyte_secret %}{{ field.default }}{% else %}{{ field.default | tojson() }}{% endif %}{% endif %} # {{ field.comment }}
1111
{%- endmacro %}
1212

1313
{%- macro render_sub_fields(sub_fields, is_commented) %}

octavia-cli/unit_tests/test_apply/test_diff_helpers.py

+3-7
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,13 @@
22
# Copyright (c) 2021 Airbyte, Inc., all rights reserved.
33
#
44

5-
from unittest.mock import mock_open, patch
6-
75
import pytest
86
from octavia_cli.apply import diff_helpers
97

108

11-
def test_compute_checksum(mocker):
12-
with patch("builtins.open", mock_open(read_data=b"data")) as mock_file:
13-
digest = diff_helpers.compute_checksum("test_file_path")
14-
assert digest == "3a6eb0790f39ac87c94f3856b2dd2c5d110e6811602261a9a923d3bb23adc8b7"
15-
mock_file.assert_called_with("test_file_path", "rb")
9+
def test_hash_config():
10+
data_to_hash = {"example": "foo"}
11+
assert diff_helpers.hash_config(data_to_hash) == "8d621bd700ff9a864bc603f56b4ec73536110b37d814dd4629767e898da70bef"
1612

1713

1814
@pytest.mark.parametrize(

0 commit comments

Comments
 (0)